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From the 
Editor-in-Chief 



Hellos and farewells 


Q his issue contains IEEE Micro's yearly 
account of the developments in the Eu¬ 
ropean computer industry. Because of its 
multinational environment, the European com¬ 
munity approaches research and development 
in a unique way. It uses the ESPRIT program as 
a way of developing a technology base in the 
area of computers. An excellent example of co¬ 
operation, this program could well be emulated 
by other industries and countries interested in 
entering the high-technology marketplace. 

A majority of the articles in this issue discuss 
accomplishments of the ESPRIT program. I think 
you will agree that guest editors Jean-Francois 
Omnes, Thierry Van der Pyl, and Philip Treleaven 
have done an excellent job of soliciting the 
articles. 

Also, with this issue we welcome a new Edi¬ 
torial Board member and say good-bye to three 
others: Marlin Mickle, Yoichi Yano, and Steve 
Dyer. Marlin Mickle has long been the editor of 
the New Products department, and Yoichi Yano 
has helped in getting articles for the annual Far 
East issues. A special thanks go to Steve Dyer, 
who has been involved in the several special 
issues on digital signal processing and who has 
been an active board member for the past four 
years. 

We welcome Ashis Khan to the Editorial Board. 
Ashis works with Mips Computer Systems, and 
he has been active in the organization of several 
conferences. Ashis will be a valuable addition to 
Micro's Editorial Board. His biography appears 
in Micro News this issue. 

I have been associated with IEEE Micro for the 
past six years, either as an Associate Editor-in- 
Chief or as the EIC. During this time Micro ma¬ 
tured into a magazine of some considerable 
standing. As with most jobs, however, the time 


has come for me to step aside and turn over the 
reigns to the next caretaker. 

My job was made easy by having an extremely 
competent, efficient managing editor, Marie 
English, and staff editor, Christine Miller. The 
Editorial Board members continually contributed 
a great deal to the quality of the magazine; they 
edited special issues and handled the reviewing 
of the vast majority of the articles that we pub¬ 
lished. One of Micro's strong points has been its 
multitude of departments and the quality of the 
writing that goes into each one. All of the de¬ 
partment editors are quality people who have 
done an outstanding job. The success of this 
publication can be traced to all of these indi¬ 
viduals. I was fortunate to be able to work with 
them. 

Micro supporters have always been willing to 
try new and innovative ideas, and the selection 
of a someone located outside the US to serve as 
the new Editor-in-Chief continues this tradition. 
As far as I know, neither the Computer Society 
nor the IEEE has had an EIC who is not based in 
the States. Dante Del Corso has served as coguest 
editor of two issues and lists among his accom¬ 
plishments extensive academic publications. He 
is a capable engineer and will do an outstand¬ 
ing job as Micro's Editor-in-Chief. I look forward 
to the future issues, and I cannot think of a better 
person to be Micro's EIC than Dante Del Corso. 
I believe you will come to feel the same. 
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From the EIC 


February 1990 

“I liked [the] special [issue] on Hot 
Chips. In general, it is encouraging to 
see the trend to practice more than 
[to] academic [articles]. Educational 
papers have been excellent—keep 
them up.” L.M., Caracas, Venezuela 
(The Hot Chips issues of Micro were 
two of the most popular issues we 
have published in some time.—-J.H.) 

“I liked [the] article on the i486 
CPU. Excellent!!!” D.F., Argentina 

“I liked this issue—an excellent 
one. ‘The Daily Micro’ on the cover: Is 
it fictitious or a reality? I’ve not seen 
one.” V.R.M., Bangalore, India (As far 
as I know, “The Daily Micro” shown 
on this cover is not published.—-J.H.) 

April 1990 

“I liked [the] announcement of the 
gallium arsenide MC68030 board by 


In the mailbag 

General Micro Systems on p. 91. (1st 
April or dream come true?) G.M., War¬ 
saw, Poland (This announcement was 
received from Motorola and was not an 
April Fool’s joke!—-J.H.) 

“I liked Micro View by Michael Slater. 
Is he trying to say that hardware is 
behind software? I would like to see 
material on commercial products.” F.F., 
Tehran, Iran 

“I would like to see information 
about the IBM RISC system.” A.A.Y., 
Elhadara, Egypt 

June 1990 

“I would like to see [an] occasional 
elementary explanation, at [the] kin¬ 
dergarten level, of commonly used 
techniques—for those of us who have 
drifted into specialized tracks.” J.E.T., 
Monee, IL (One of the goals that I 
wanted to accomplish as Micro's EIC 


was to publish more tutorial ma¬ 
terial in the magazine. To date, I 
have not been able to accomplish 
this. Writing a tutorial is a difficult 
task, and so is reviewing them.— 
J.H.) 

“I would like to see more on 
microcomputer networking, please!” 
S.K., Santa Clara, CA 

“I think you publish a good mix 
of papers; I do enjoy reading pa¬ 
pers written by people who have 
designed the products they are 
discussing. They have an oppor¬ 
tunity to say why they did what 
they did in a way that does not 
always appear in instruction books, 
application notes, etc. I have not 
had difficulty identifying authors 
who are working for the company 
mentioned. S.H., Buffalo, NY 


Call for 
Papers 

IEEE Micro seeks 
manuscripts for 
1991 and 1992 

Submit manuscripts to: 

Joe Hootman, Editor-in-Chief, EE Dept., 
University of North Dakota, 

PO Box 7165, Grand Forks, ND 58202, 
phone (701) 777-4331. 


GENERAL-INTEREST 

TOPICS 

• Biological computing 

• Artificial intelligence 

• VHDL design and workstations 

• Operating systems 

• Multiprocessing 

• Microcomputing to aid the 
handicapped 

• Optical computing 


4 IEEE Micro 



















Micro 
World 
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Research Center 
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Switzerland 


Reunifying East-West electronics industries 


U t is hard for the best pupil in one class to 
transfer to a more advanced class. Often 
this once-successful student makes sub¬ 
average grades in the new class. The struggle to 
catch up takes hard work, evening study, and a 
lot of courage. 

Today, this is the experience of East German 
engineers. A year ago, they set the standards for 
the Eastern European electronics industry. Their 
engineers helped to make space exploration 
possible. Their electronics industry—represented 
by the state-owned VEB (Volks-Eigene Betriebe, 
or people-owned factories)—maintained a mo¬ 
nopoly on electronic appliances and telecom¬ 
munications. The VEB Mikroelektronik Kombinat 
manufactured 4-Mbit DRAMs (dynamic RAMs) 
and 32-bit processors. ASICs (application-specific 
integrated circuits) came in 1.5-|im CMOS 
(complementary metal-oxide semiconductors). 
The VEB Robotron Kombinat produced about 
100,000 personal computers per year. 

Generous support of the former German 
Democratic Republic (GDR) government made 
the electronics sector the flagship of its entire 
national industry. In fact, the Communist gov¬ 
ernment aimed to make the electronics industry 
autonomous within a few years. 

Probably no other country in the world in¬ 
vested so heavily per inhabitant in microelec¬ 
tronics. With more than 130,000 employees in 
both the microelectronics field and the electron¬ 
ics industry, East Germany rivaled West Germany 
in numbers of people working in electronics- 
related areas. 

Upon further investigation, however, one finds 
these statistics a little misleading. For example, 
64-Kbit DRAMs and 8-bit, Z-80 lookalikes make 
up the bulk of East German series production. 
Half of the IBM compatibles produced are still 


8-bit machines. The Robotron computers copy 
the IBM 370 and VAX 11/780, which lag 10 years 
behind the times. Compared to West Germany, 
East German component technology lags an es¬ 
timated five years behind West Germany, which 
itself is not a world leader. Appliances and pro¬ 
cess control equipment are a decade behind; in 
some cases the technological gap with Western 
Europe spans nearly 15 years. 

As the Berlin Wall tumbled down, one imme¬ 
diately perceived the GDR’s inability to compete 
with the electronics industry in the West. In spite 
of massive price slashes, East German citizens 
refused to buy otherwise acceptable radios built 
in their countiy because of the radios’ appear¬ 
ance, quality record, and the reminiscence of 
bad times. 

An example 

The competition and reunification process of 
the two German industries is beautifully illus¬ 
trated by the Osram light bulb factory in Berlin. 
Split in two by the building of the two Germanies 
in 1949, each of Osram’s half-firms worked in¬ 
dependently for 40 years, supplying goods to 
each of their respective markets. In December 
1989, prior to the reunification of the countries, 
Osram executives evaluated reuniting the firms. 

Employees of one firm visited the other. The 
first question of the Eastern visitors was, “Where 
are the people?” Osram’s fully automated produc¬ 
tion facility in West Berlin needed no production 
personnel except for maintenance and supervi¬ 
sory control workers. In contrast, the East Berlin 
plant resembled a crowded marketplace. These 
employees accomplished production by hand, 
even using machines of 1949 vintage. Each East¬ 
ern factory worker achieved only one tenth of the 
production of the Western factory employees. 
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Poor quality and lack of quantity re¬ 
sulted. The West Berlin employees 
could not believe that people worked 
under such conditions. They also won¬ 
dered about employee salary levels that 
allowed the East Berlin firm to sell the 
bulbs so cheaply. 

Under these conditions, the West 
Osram declared—contrary to its initial 
statements—that reuniting with the 
Eastern firm entailed employee layoffs 
and reduction of the considerable 
benefits to workers (including free 
medical care and free kindergarten). 
Around the time of this announcement, 
the introduction of the free exhange 
of goods brought an influx of Western 
goods into East Germany. Sales of East 
Osram bulbs plummeted to zero—even 
with dramatically slashed prices—be¬ 
cause East Germans preferred West 
German bulbs. 

Consequently, the Eastern factory 
ceased operations. West Osram refused 
to invest in it or absorb it. Covering 
the Eastern market required only a 
moderate increase in the firm’s auto¬ 
mated facilities, and West Osram did 
not desire the know-how of East Ger¬ 
man engineers. 

The world crumbled for all East 
Osram employees: no more job secu¬ 
rity, loss of social support, and soaring 
cost-of-living rates to West German lev¬ 
els. Many former employees emigrated 
to the West—the younger workers 
training for new vocations, the older 
ones hoping the West German social 
security net would adequately support 
them. 

The East and West Osram situation 
illustrates the massive shake-up occur¬ 
ring with German reunification. On the 
other hand, Westerners now wonder 
how much they will pay for this 
process. 

The press analyzes the extreme di¬ 
vergence of the two Germanies either 
by saying, “I always said it would end 
up this way” or “They would have 
made it if....” East Germans themselves 
provide the best analysis. Professor 
Wolfgang Marschall of the Academy of 


Science of the GDR presents an excel¬ 
lent case for his position. 1 The East 
German analysts agree that the lack of 
pressure from their country’s market 
allowed uneconomical firms to de¬ 
velop. Low per-capita productivity re¬ 
sulted because full employment 
received a higher priority than pro¬ 
ductivity. Government subsidies for the 
flagship electronics industry occurred 
at a cost to other national industries. 
Planning ignored the rules of interna¬ 
tional trade. Isolation from the Western 
market and the scientific community 
also played important roles in the 
breakdown. 

East meets West 

In the electronics industry, East Ger¬ 
man engineers unquestionably match 
the skill of their Western colleagues. 
In fact, considering the poor conditions 
under which the Easterners work for 
years, one can only be amazed that 
the technology they developed lags 
behind the West by only a decade and 
a half. The quality of teaching and the 
theoretical work at the Academy of 
Science compares favorably to West¬ 
ern standards—even without rich 
endowments. 

Contrary to some opinions, the mo¬ 
tivation of East German engineers re¬ 
mained high even before talk of 
reunification. Instead of satisfying eco¬ 
nomic objectives, they optimized their 
work in a spirit of competition with 
the government plan. For instance, the 
designers cared little how much their 
products cost: figures for costs-per- 
transistor functions totaled 10 times 
higher than the international level, 
production of DRAMs yielded below 
10 percent, and production of 32-bit 
microprocessors yielded under 0.01 
percent. 

VEB maintained uneconomical pro¬ 
duction lines simply because the gov¬ 
ernment plan imposed them to supply 
a few special parts, such as those used 
by the army. Although perfectly con¬ 
scious of the economic nonsense, the 
executives saw no other alternative 


since their plant almost solely supplied 
the Eastern market. 

Indeed, although the international 
market stocked most of these special¬ 
ized parts, NATO’s Cocom (coordina¬ 
tion commitee) embargo closed the free 
market to East Germany. Ironically, 
Cocom not only restricted the East 
German electronics industry from pur¬ 
chasing parts from the West, it allowed 
the industry to subsist without an eco¬ 
nomical basis. 

Of course, when firms really needed 
Western computers and components, 
they received them through a rather 
tedious process. We know now that 
the GDR’s special department, the 
KoKo (Kommerzielle Koordination), 
used the “gray market” to supply East 
German firms with items prohibited by 
Cocom. Use of this channel resulted in 
expensive supplies and slower devel¬ 
opment of technology. For instance, 
one can trace the poor yields of DRAMs 
to a successful Cocom action to pre¬ 
vent the GDR from obtaining semi¬ 
conductor manufacturing machines 
from the West. 

The East German development de¬ 
partments lacked modem design tools 
and advanced components. Advanced 
components, if produced at all, ended 
up going to the military. Lack of cur¬ 
rency and export restrictions hampered 
the acquisition of tools and compo¬ 
nents from the West. English language 
barriers kept the GDR engineers from 
obtaining fresh news about their field 
(Russian is the usual second language). 
They relied on West German electronic 
magazines as their prime source of in¬ 
formation. The general fear of speak¬ 
ing openly about one’s work hindered 
the interchange of information—espe¬ 
cially with Russian scientists. 

Even now, the VEB engineers ignore 
how many laboratories in Russia work 
in the same domain as themselves. 
Also, the regime reserved participation 
in Western conferences for its loyal¬ 
ists, whose preoccupation with smug¬ 
gling goods took precedence over 
exchanging information. 
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The East German labs generated 
many good ideas, but translating ideas 
into a market product was an excep¬ 
tion. By comparison, Marschall esti¬ 
mates Japan’s transfer efficiency is “70 
times better” than East Germany’s. In 
fact, GDR politicians viewed transla¬ 
tion to products as unimportant; they 
preferred to boast about their high- 
technology, 4-Mbit RAMs than about 
market sales. 

Actually, sales were good, but cus¬ 
tomers in the Eastern European 
(Council for Mutual Economic Assis¬ 
tance, or Comecon) countries paid in 
soft currencies, while technology pur¬ 
chases required hard currencies. The 
“sales” departments ended up as 
“customer keep-away” departments, 
since demand far exceeded production. 
As a result, the firm determined the 
priority of its customers. 


East Germany’s 
estimated 
demand for 
personal 
computers totals 
150,000-200,000 
machines a year. 

The lack of building maintenance, 
the struggle to acquire parts, poor tele¬ 
phone service, and the unreliable post 
office comprised a large portion of daily 
work. Administrative red tape and the 
dust of bureaucracy added to this 
gloomy picture. One learns that a coun¬ 
try cannot sustain a high-tech industry 
amidst a broken-down economy. High- 
tech industries depend on a function¬ 
ing infrastructure. Many developing 
countries make the same mistakes. 

The responsibility for this situation 
falls not entirely on the former GDR 


government, but rather reflects the 
failure of a system. Before speaking of 
the superiority of one system over an¬ 
other, one should remember that the 
symptoms and problems experienced 
by the East Germans also arise in 
Western industries, especially in large 
conglomerates. In these organizations, 
the responsibility of individuals often 
blurs and strategic decisions sometimes 
defy common sense. Every now and 
then, the invoking of arguments simi¬ 
lar to those of the Eastern government 
hinders the free interchange of infor¬ 
mation and goods. The difference is 
this: Market laws in the West rapidly 
eliminate unhealthy firms. Finally, one 
must recognize that the East German 
industry collapsed not just because of 
its own evils. The technology block¬ 
ade and the diversion of resources for 
the arms race at least worsened the 
situation significantly. 

What will happen next? 

The rejuvenation of Eastern firms 
seems quite uncertain. After a first 
phase of enthusiastic talks of coopera¬ 
tion, Western companies now either 
ignore Eastern firms or buy them to 
secure market share. As a result, few 
Eastern firms will survive. The micro¬ 
electronics and computer industries 
may stay in business because, apart 
from the optical industry, they are the 
only sectors that attain international 
standards. However, the future of VEB 
Mikroelektronik remains clouded, even 
if it gains membership in the European 
Semiconductor Jessi (Joint European 
Submicron Silicon) project. One ex¬ 
pects massive layoffs at VEB. 

The Eastern engineers will help to 
fill the current shortage of design engi¬ 
neers in West Germany. Similar to what 
Czech engineers experienced after their 
defection to the West during the 1968 
uprising in Prague, qualified East Ger¬ 
man engineers will obtain key posi¬ 
tions in the Western industry within a 
few years. They bring a resourceful¬ 
ness not often found in the West. 

The market of East Germany will 


expand greatly in the next several years. 
For instance, estimated demand for 
personal computers totals 150,000 to 
200,000 machines a year. It will take 
cooperation between Western compa¬ 
nies and Eastern firms (now likely op¬ 
erating as sales offices) to satisfy this 
demand. 

Interest in the German reunification 
goes beyond the two affected coun¬ 
tries. Reunification presents a labora¬ 
tory for people to study the possible 
reunification of Europe, including most 
of the Soviet Union—a market second 
only to China in terms of population. 
Europe faces its second post-World War 
II reconstruction, and East Germany 
may become the bridge between East 
and West. 


Reference 

1. W. Marschall, "DDR-Mikroelektronik— 
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Guest Editors’ Introduction 

Parallel Computing in Europe 


T he European Economic Community 
supports a large number of research 
projects investigating the design of 
parallel computing systems. The Eu¬ 
ropean Strategic Programme for Research and 
Development in Information Technology funds 
many of these projects. 1 ' 3 ESPRIT has the follow¬ 
ing objectives: 


Projects selected from public calls for proposal 
and based upon an annually updated Workplan 
implement ESPRIT’s goals. The program com¬ 
prises collaborative, precompetitive research and 
development projects, which are carried out across 
frontiers by Community companies, universities, 
and research institutes. The European Commu¬ 
nity budget funds 50 percent of a project, and 
project participants fund the other 50 percent. 

A consortium of companies and universities, 
each containing at least two independent in¬ 
dustrial partners from different member states, 
undertakes each project. Projects vary greatly 
in composition. A large project might involve 
three companies and three universities from 
different member countries of the European 
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CEC ESPRIT 
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to provide the European information tech¬ 
nology industry with the basic technologies 
needed to meet the competitive requirements 
of the 1990s, 

to promote European industrial cooperation 
in information technology, and 
to pave the way for standards. 


University College 
London 
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Community, with funding of 10 million ECUs and a span 
of three years. (As of October 1990, 1 European Currency 
Unit equals 1.31 US dollars.) Projects subdivide according 
to focus into A-type projects that emphasize research and 
development, and B-type projects emphasizing longer term 
applied research. 

The first phase of ESPRIT, called ESPRIT I, 1,2 was a five- 
year research program lasting from 1984 to 1989. The total 
research and development effort of the first phase amounted 
to approximately 1.5 billion ECUs in funding. ESPRIT I 
launched 226 projects. At its peak, 3,000 engineers and 
scientists from 420 independent organizations worked full¬ 
time on ESPRIT I projects. 

The second phase, called ESPRIT II, 2 ’ 3 started in 1988 and 
continues with the same broad objectives. A larger scale 
operation, it still preserves the mechanisms used by ESPRIT 
I. These include cost-sharing between the European Com¬ 
munity and partners, the principles of consensus building 
and on-going assessment, and the Workplan. The total ES¬ 
PRIT II budget amounts to approximately 3-2 billion ECUs, 
or a project cost per year of approximately 800 million ECUs. 

The first-call ESPRIT II proposals met with an unprecedented 
response. The total requested funding exceeded the avail¬ 
able budget by a factor of eight. After an evaluation process 
by independent experts, ESPRIT II advisers selected 156 pro¬ 
posals, involving 585 participating organizations. A particu¬ 
larly encouraging sign comes from the interest of small and 
medium-size enterprises (SMEs) in ESPRIT. Between ESPRIT 
I and ESPRIT II the number of participating SMEs (defined as 
having less than 500 employees) rose from 180 to 290. Of 
these SMEs almost half employ less than 50 employees. 
Of the 315 independent information technology vendors 
participating in ESPRIT II, 65 percent qualify as SMEs. 

In addition to its main research program, ESPRIT II maintains 
so-called Basic Research Actions, which concentrate on 
fundamental research. Basic Research Actions are small 
research projects mainly undertaken by universities and 
research institutes throughout Europe. 

In Europe, ESPRIT is the focus for research into parallel 
computers and new programming languages. ESPRIT de¬ 
votes about 10 percent of its resources to parallel systems. 
This issue of IEEE Micro presents five significant research 
projects on parallel computing. We chose these projects 
to reflect the diversity of approaches. 

The first article by Odijk of the Philips Research Laboratories 
in The Netherlands describes ESPRIT project 415, entitled 
“Parallel Architectures and Languages for Advanced Infor¬ 
mation Processing—A VLSI-Directed Approach.’’This project, 4-6 
which began in 1984 and ran for five years, investigated and 
compared the major approaches in designing high-perfor¬ 
mance parallel computer systems: object-oriented, functional, 
and logic. It brought together six major European informa¬ 
tion technology companies, Philips, AEG, GEC, Bull, CSELT, 


and Nixdorf, which were supported by a number of out¬ 
standing research organizations. 

Whitby-Strevens, in his article about the Inmos transputer, 
examines the continuing development of this family of 
microprocessors. 7 The transputer family fulfills four main 
objectives: It creates a product range that is easy to pro¬ 
gram and engineer; it provides maximum performance to 
the user; it utilizes increasing levels of VLSI integration; 
and it creates a programmable component for large paral¬ 
lel systems. This article begins by reviewing the basic par¬ 
allel programming and architecture concepts of the transputer 
that led to the next-generation transputer, codenamed HI. 
It also outlines some of the ESPRIT projects that are assist¬ 
ing the transputer’s development. One, the Supemode project, 
produced a range of parallel computers with up to 1,000 
processors. The article also describes two innovative 
transputer-based applications; one is a hand-held satellite 
navigation system and the other, the Marconi Martello long- 
range radar system. 

In Europe, ESPRIT is the focus for 
research into parallel computers 
and new programming 
languages. 

The third article by Haworth, Leunig, Hammer, and Reeve, 
describes the European Declarative Systems project. 8 A major 
technology integration project within ESPRIT II, EDS brings 
together Bull, ICL, Siemens, their jointly owned ECRC re¬ 
search center, and many universities across Europe. A major 
goal of EDS is the production of a high-performance, parallel 
relational database server. The EDS system being developed 
will support business information across a spectrum from 
data to knowledge, and manage it intelligently. To this end, it 
will support Unix, extended SQL, Lisp, C++, and the Elipsys 
parallel logic programming language. 

Rounce from University College London and Delgado of 
Inesc Portugal describe the SPAN project, 9 which investi¬ 
gated the integration of parallel symbolic and numeric pro¬ 
cessing. The project covered parallel applications, languages, 
and architectures. At the core was the so-called Kernel Sys¬ 
tem, a target machine language (Parle), and its virtual ma¬ 
chine. This core served as the focal point for compiling the 
symbolic and numeric languages onto parallel numeric and 
symbolic parallel machines. The article presents two archi¬ 
tectures, Sprint, a parallel microprocessor, and DICE, the 
parallel object-oriented system, both developed as part of 
SPAN. 
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Introduction 


Lastly, THE ARTICLE BY AnGENIOL, formerly of 
Thomson-CSF and now of Mimetics in France, presents the 
Pygmalion neural computing project. 10 The objectives of 
Pygmalion are 1) to demonstrate to European industry the 
potential of neural networks for applications in image, speech, 
and acoustic signal processing; and 2) to develop a European 
“standard” neural network programming system. This system 
comprises a graphical environment, library of common al¬ 
gorithms, and high-level language. Pygmalion brings together 
many of the leading neural computing research groups from 
European industry, research institutes, and universities. These 
include Thomson-CSF, CSELT, Philips, SEL Alcatel, Siemens, 
and universities from England, France, Portugal, and Spain. (B 
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Parallel Computers for 
Advanced Information Processing 


ESPRIT project 415 involved the study of object-oriented[ functional ' and logic programming 
styles in six subprojects. We describe the parallel languages and architectures designed to 
implement them as well as our parallel object-oriented system. This design includes a novel 
language , decentralized memory architecture , and system software. 
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arallel computing efforts are probably 
as old as the history of computing 
itself. For a long time performance 
requirements of numerical programs, 
for use in solving scientific and engineering 
problems, drove these efforts. In the early 1980s, 
however, research programs began to apply par¬ 
allelism to the wide class of symbolic programs. 
The Fifth-Generation Computer System program 
in Japan aimed at a very ambitious leap toward 
an information society through the combination 
of artificial intelligence and parallel computing. 

Europe soon followed and initiated its ESPRIT 
I program from 1984 to 1989. ESPRIT’s distrib¬ 
uted approach identified important topics and 
formed consortia to share with the scientific and 
industrial bodies in Europe. A significant part of 
this program concerned knowledge engineering 
and advanced architectures under the heading of 
advanced information processing (AIP). From the 
start, the European information technology in¬ 
dustry actively participated. 

In late 1984 the first and largest project on par¬ 
allel symbolic computing started under ESPRIT 
number 415. The project carried the title of Paral¬ 
lel Architectures and Languages for Advanced 
Information Processing—a VLSI- (very large scale 
integration) oriented approach. Unlike most other 
projects, 415 did not dictate one parallel technol¬ 
ogy but aimed at investigating and comparing, in 
a language-first approach, the major approaches 


in designing high-performance parallel systems 
for AIP: logic, functional, and object-oriented. Six 
of the 12 major European information technology 
companies, supported by a number of outstand¬ 
ing research organizations, carried out the project. 

In each of six subprojects the researchers in¬ 
vestigated a specific style of parallel processing. 
Their goals were to achieve a parallel implemen¬ 
tation for a (novel) programming language and 
demonstrate the performance improvements in 
some selected symbolic applications. Prototypes 
were delivered, ranging from four to 100 proces¬ 
sors and showing good speed-ups as well as 
absolute speeds. 

Before we describe in detail the technology of 
the object-oriented approach pursued by our 
team, we offer an overview of project 415. We 
discuss its starting points, its approaches, and its 
results. In particular, you will find it interesting to 
see which common conclusions were reached 
among the different styles. 

Symbolic parallel computing 

Numeric algorithms, characterized by regular 
data structures and uniform operations on them, 
benefit from the parallel execution means of vec¬ 
tor computers. These algorithms also benefit from 
shared-memory multiprocessor systems, sup¬ 
ported by parallelizing compilers and careful 
tuning by the programmer. 

The new category of symbolic applications, 
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Table 1. Partners, subcontractors, and approaches. 

Subproject 

Partners 

Subcontractors 

Contributions 

A 

Philips 

CWI, Amsterdam 

Object-oriented machine Pooma 



Technical University Eindhoven 




University of Oldenburg 



AEG 

Technical University of Berlin 

VLSI simulator on Pooma 

B 

GEC 

University College London 

Lazy functional languages, 



Imperial College London 

Reduction machine 

C 

BULL 

LITP, University of Paris 

Logic database machine 



ESIEE, Noisy-le-Grand 


D 

CSELT 

University of Pisa 

Mixed-logic functional machine 

E 

Nixdorf 

Stollmann, Hamburg 

Dataflow machine 

F 

Nixdorf 

LIFIA-IMAG, Grenoble 

Connection method 



Technical University of Munich 

Logic machine 


partly overlapping with the general denominator of artificial 
intelligence, requires programming styles and languages with 
improved expression power as well as more dynamic and 
flexible execution mechanisms. Additionally, high-level con¬ 
cepts of communication and synchronization that resulted from 
research in the 1970s needed to be integrated in such lan¬ 
guages to achieve effective programming for and efficient ex¬ 
ploitation of parallelism. 

These new concepts and the desire for a scalability of at 
least two orders of magnitude of parallelism necessitated a 
broad scope of research. By scalability, we mean that the con¬ 
cepts hold for a large range of machine sizes (number of 
processors.) We researched the simultaneous and integral de¬ 
sign of parallel architectures and languages for symbolic appli¬ 
cations. Representative instances of the latter would prove the 
expressiveness of the languages and the achieved performance 
of the parallel systems. Finally, researchers recognized that a 
better understanding of the theoretical foundations of the par¬ 
allel programming styles and concurrency in general would 
impact the quality of language and program design. 

Project 4l5’s charter, as reflected in its full title, acknowl¬ 
edges the above observations. 

Project participants considered a number of programming 
models to be potentially feasible, but neither the validity nor 
even the superiority of one of them had been established. 
Therefore, they decided to investigate a number of distinct 
parallel programming styles and propose architectures and 
languages to support these styles. They defined subprojects, 
each to be executed by one partner and its academic subcon¬ 
tractors (see Table 1). 

Projectwide working groups formed in the areas of architec¬ 
ture and applications, semantics, and proof theory and verifi¬ 
cation. The groups facilitated a proper platform for the 
presentation and discussion of mutually important topics and 


advanced the theory of concepts involved. The project com¬ 
prises some 280 man-years, 20 percent of which were dedi¬ 
cated to the working groups. 

The project emphasized education and exchange of scien¬ 
tific results in this area, eventually causing two summer 
schools 12 and two Parle conferences 34 to be organized. The 
proceedings of the latter include reports on the subprojects. 

Before describing the goals and technical directions of the 
subprojects, let’s discuss the issues at stake in a general man¬ 
ner. In a language-first approach, the feasibility of the selected 
programming model forms the hypothesis for each of the ap¬ 
proaches. An abstraction of the corresponding implementa¬ 
tion is the execution model. Table 2 on the next page provides 
a survey of these and links a number of attributes. 

Logic and functional programming models belong to the 
class of declarative systems, while object-oriented program¬ 
ming falls within the imperative programming class. Declara¬ 
tive languages are considered to be higher level, more 
expressive, and concise, and they allow formal assessment 
due to their strong mathematical basis. Until recently however, 
their efficient implementation has not been well understood. 

Two categories of parallelism exist, explicit and implicit. In 
the latter case the implementation (compiler and operating 
system) extracts an amount of parallelism from the program. 
With explicit parallelism (for example, in object-oriented pro¬ 
gramming) the programmer holds responsibility for partition¬ 
ing a program into objects. To avoid burdening the 
programmer, the language should offer high-level, natural fa¬ 
cilities for communication and synchronization. Implicit paral¬ 
lelism is in principle present in the execution model of 
declarative languages. Thus, programmers would be able to 
concentrate on the algorithm per se. The compiler has to carry 
out the burden of extracting enough parallelism with a coarse- 
enough granularity. 
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Table 2. Execution models and attributes for subproject system. 


Language style, 



Subproject 

operational model 

Parallelism 

Architecture 

A 

Object-oriented, control flow 

Explicit 

Distributed memory, packet-switching network 

B 

Functional, graph reduction 

Implicit 

Emulation on transputer 

C 

Logic (many data), delta driven 

Implicit 

Distributed memory, bus based 

D 

Logic + functional, SLD resolution 

Explicit 

Shared memory, multistage network 

E 

Single assignment, dataflow 

Implicit 

Shared memory, bus based 

F 

Logic (many rules), connection method 

Implicit 

Shared memory, bus based 


The preferred parallel architecture relates to the program¬ 
ming model and the required size and scalability of the par¬ 
allelism. A general choice falls between shared-memory and 
distributed-memory architectures, and (mostly coinciding) 
between communication and synchronization through shared 
variables or message passing. Other choices relate to the 
amount of hardware support for specific features of the ex¬ 
ecution model. An important unifying concept in the project 
is that of the homogeneous parallel machine, which consists 
of a number of identical nodes. This concept provides for 
extensible systems and allows an implementation to take 
advantage of the relatively low replication costs of VLSI-based 
node realizations. 

A common characteristic in the implementation of the 
various language models is the use of a dynamically varying 
number of processes. Usually, a processor carries out many 
processes, mostly with dynamically changing patterns of 
communication and interaction. This characteristic requires 
an operating system (kernel) to be resident on each of the 
nodes to manage the various resources at the node and sys¬ 
tem level. 

Thus, a number of common and related problems arise, 
while their solution may differ with the language model. The 
subprojects directed their individual efforts at solving these 
problems in the contexts of the specific programming models. 

Goals and results of subprojects 

Here, we describe each of the 415 subprojects on behalf of 
our partners, as indicated in Table 1. 

The object-oriented approach. The essence of object- 
oriented programming is the subdivision of a system into 
objects, that is, integrated units of data and procedures. Ob¬ 
jects communicate by passing messages, which must be in¬ 
terpreted as a request from the sending object to the receiving 
object to execute a certain procedure (called a method). To 
combine object-oriented languages with parallelism, we chose 
to associate with every object a process of its own. The model 
is intuitively appealing, with message passing as the only 


facility for communication between objects. 

Such a system’s architecture would consist of many, func¬ 
tionally identical and control-flow structured, computers 
connected via a direct message-passing network. The com¬ 
puters contain a central processor, private memory, and a 
dedicated communication unit to perform the message pass¬ 
ing without interference of the CPU. The concept allows sys¬ 
tems of more than 1,000 such computers. 

The major goals of this subproject (A in Table 2), derived 
from the chosen approach, follow: 

• A parallel object-oriented language POOL, in which sig¬ 
nificant application programs can be programmed. The 
language provides the user with control of parallelism 
and granularity. The language must have clear, formal 
semantics. Support for the verification of programs is 
desirable, since it is even more important in a parallel 
environment than in a sequential one. 

• A prototype parallel object-oriented machine, Pooma, 
which consists of 100 identical self-contained comput¬ 
ers. Each computer has a powerful 32-bit processor, lo¬ 
cal memory, and communications means, which are 
connected in a direct packet-switching network. Since a 
Pooma node executes many (10 to 100) processes, the 
processor architecture must support multiprocessing. 
Each node of the system contains a copy of the operat¬ 
ing system kernel. This kernel performs local resource 
management and cooperates with the other kernels for 
global operating system tasks. The prototype Pooma 
system connects as a satellite to a host computer, in 
which the programming environment resides. The pro¬ 
totype, based on existing technology, will offer facilities 
for experimenting and for evaluation of performance 
aspects. 

• Three significant applications in the area of symbolic 
processing that demonstrate the performance increase 
through parallelism on Pooma. The first of these is a 
parallel theorem prover. The second is a parallel version 
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of the analytical component of the Rosetta natural lan¬ 
guage translation system. The Computer Science depart¬ 
ment at Philips Research currently develops Rosetta. The 
third application is a multilevel VLSI circuit simulator 
designed at AEG (described later). 

We obtained these goals. Currently, a 100-node Pooma 
system is operational, based on the 68020 processor and a 
proprietary communication processor. The latter performs, 
in hardware, complete end-to-end routing for fixed-size 
packets, using alternative paths to avoid congestion. A built- 
in routing mechanism is free of deadlock. This powerful and 
efficient mechanism for higher layers of the system avoids 
interruption of the intermediate data processors. 

POOL2 serves as the programming language for Pooma. A 
later version, POOL-X, a superset of POOL2, lets users pro¬ 
gram the database system of the Prisma project, also at Philips 
Research. In the course of the project, we defined both the 
operational and denotational semantics for POOL and proved 
that they are equivalent. Furthermore we designed a sound 
and complete formal proof system. The POOL implementa¬ 
tion on Pooma consists of an operating system kernel and a 
compiler. Furthermore we developed a portable POOL 
implementation called ptc, which provides pseudoparallel 
execution on sequential systems and contains extensive sup¬ 
port for debugging. 5 We implemented all compilers using 
Elegant, 6 a set of newly designed construction tools for parsers 
and code generators based on lazy evaluation in attribute 
grammars. 

We obtained designs and implementations of a number of 
applications. Among these are the above-mentioned theo¬ 
rem prover and analytical component for natural language 
translation. First evaluations show promising speed-ups. 

A VLSI simulator in POOL. Another main goal of 
subproject A is a new parallel, multilevel simulator (PMLS) 
for VLSI circuits. 7 PMLS runs on general-purpose parallel ma¬ 
chines and combines multilevel simulation and exploitation 
of parallelism to achieve optimal performance. The initial 
implementation of the simulator is in the object-oriented POOL 
language for the parallel Pooma machine. Powerful simula¬ 
tors are a key element in today’s and future digital circuit 
design environments and a significant application for future 
parallel machines. 

PMLS focuses on the logic design levels—register transfer, 
functional gate, and switch—with provisions for including 
the programming and the electrical levels. The simulator uses 
one simulation concept for all levels. This broadband con¬ 
cept is characterized by distributed discrete-event simulation 
and the partitioning of the circuit into subcircuits, which 
subsimulator processes simulate in parallel. Each subcircuit 
may contain elements at different abstraction levels. 

Subsimulators execute asynchronously using their own local 
time, communicate by exchanging event messages, and syn¬ 


chronize by using the so-called time-warp approach. In this 
approach, a subsimulator always runs forward at full speed, 
but in case of an event message being received carrying a 
time stamp less than the local simulation time, the subsim¬ 
ulator rolls back to the earlier time. The concept delivers 
sufficient grain size to gain significant speed-up on a parallel 
general-purpose machine like Pooma. The object-oriented 
implementation allows the highly dynamic features of object 
creation and object interconnection for dynamic simulation— 
for example, incremental simulation, dynamic change of the 
abstraction level (zooming). It also provides high flexibility. 

We obtained this goal. Currently PMLS is in the final imple¬ 
mentation stage: We implemented the main parts of the 
simulator in POOL. Work continues on the advanced fea¬ 
tures (incremental simulation, zooming) and on the tuning of 
the simulator. 

We use the POOL interpreter as well as the Pooma proto¬ 
type to measure the execution behavior of the simulator. 
First evaluations show promising speed-ups. In addition, we’ve 
partially integrated the prototype version of PMLS into the 
existing environment of AEG’s Disim, a commercial VLSI 
simulation system. 

We therefore have a complete system, a parallel object- 
oriented program implemented on a parallel machine and 
partially integrated into the environment of a commercial 
simulation system. We’ve gone even beyond our original goal. 

A further result of the project is a parallel fault simulator. In 
PFS, fault sets dynamically split into subsets, which then pro¬ 
cess as parallel jobs in the nodes of a local-area network, or 
LAN. PFS achieves a near-linear speed-up. 

In summary we can say that VLSI simulation—as a special 
case of discrete-event simulation—is definitely a promising 
application of parallel general-purpose machines like Pooma. 

The functional approach. In subproject B we chose to 
take a lazy functional language, which had no parallelism 
annotations, and implement it efficiently on a parallel archi¬ 
tecture. At the beginning of the project, implementations of 
functional languages were interpretative, sequential, and slow. 
It seemed we could obtain significant increases in perfor¬ 
mance by 1) designing specialized hardware to support their 
execution, and 2) designing a parallel implementation that 
exploited their theoretically implicit parallelism. 

We chose lazy functional languages because they seemed 
to be a very powerful programming paradigm, which is im¬ 
portant for designing reliable and large systems. Furthermore, 
the parallelism was theoretically implicit in the execution 
mechanism of the language and thus invisible to the 
programmers. 

We approached the problem in a “language-first” manner. 
We first developed a parallel evaluation model, which re¬ 
tained the semantics of the language and then drove the 
architectural design. The original project description speci- 

continued on p. 61 
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Transputers—Past, Present, and Future 


High-performance enhancements in transputers signal a trend toward general-purpose 
computing. We present the progress, products, and results of ongoing work in transputer 
development. 


Colin Whitby-Strevens 

Inmos Limited 


transputer transformation is under¬ 
way. Exploiting the multiprocessing 
potential of the transputer, ESPRIT 

1_I projects continue to develop higher 

performance, lower cost parallel processing 
computers. In advancing the transputer toward 
general-purpose computing, Inmos Limited is 
also improving the transputer’s suitability in 
embedded systems with the upcoming release 
of new products. 

The first transputer emerged at a time when 
very large scale integration (VLSI) technology 
permitted a combination of a small, fast pro¬ 
cessor (20,000 transistors) and local memory and 
communications facilities on silicon. At the same 
time, the effective exploitation of VLSI technol¬ 
ogy also resulted in the development of reduced 
instruction-set computing (RISC) processors. Both 
the transputer and conventional RISC proces¬ 
sors reevaluate architecture trade-offs in the 
context of VLSI capabilities. However, devel¬ 
opers of the transputer created a design at the 
instruction-set level to support multiprocessing 
across a number of transputers and within a 
single transputer without the overheads usu¬ 
ally associated with complex runtime software. 

In embedded systems, the transputer appli¬ 


cation designer directly controls the transputer 
hardware without the need for a resident oper¬ 
ating system or runtime kernel. Applications for 
transputers include office systems (fax, 
videophones, laser printers, terminals), digital 
telecommunications, military systems and hand¬ 
held satellite navigation systems (see box on 
sample applications), industrial control systems, 
and music synthesizers. 

Initial promotion of the transputer focused 
on its use as a general-purpose component for 
special-purpose systems. A number of applica¬ 
tions—particularly in graphics and image 
processing—clearly required high-performance, 
floating-point operations. Inmos achieved this 
capability very effectively by adding a floating¬ 
point unit within the transputer chip (in con¬ 
trast to the conventional, but more cumbersome, 
coprocessor approach). This advancement 
opened up the exciting prospect of construct¬ 
ing parallel processing machines using arrays 
of transputers to produce supercomputer-level 
performance. 

Applications requirements for simulation and 
modeling (for example, quantum chromody¬ 
namics and fluid-flow analysis) provided the 
incentive to establish the ESPRIT Supernode 
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Two sample applications 


The transputer’s suitability to a wide range of embed¬ 
ded systems is illustrated by considering applications at 
both extremes. The first application—a hand-held naviga¬ 
tional system—uses just one transputer to address such 
issues as high performance, low cost, and low power. The 
second application—a long-range, three-dimensional ra¬ 
dar system—uses up to 4,000 transputers to provide 
supercomputer levels of performance, but in a real-time 
application. 

Hand-held satellite navigation system 

Some 18 satellites, in six orbits, operate as part of a 
Global Positioning System. Each satellite continuously 
transmits its position in an encoded form, and by listen¬ 
ing to four satellites one can calculate the position (in 
three dimensions) of the receiving station. 

The traditional approach involves the use of complex 
dedicated signal processing hardware. However, the Gypsy 
hand-held receiver, developed by Columbus Positioning 
Limited, performs the complex signal processing and 
mathematics required in one transputer, which also ser¬ 
vices a keyboard and controls a display (see Figure A). 

A large, developing market for the system includes 
marine navigation, transport control systems (reporting 
delivery truck positions), and automotive applications 
(dashboard navigation systems). 

The advantages of the transputer-based solution over 
the traditional approach include relatively fast acquisi¬ 
tion time (which also saves battery power), the elimina¬ 
tion of custom logic for signal processing, the minimal 
amount of glue logic required, the low specification re¬ 
quired on radio oscillators (transputer software performs 
frequency compensation), and quick implementation of 
future product changes. 

Long-range radar system 

The Martello long-range, three-dimensional radar sys¬ 
tem under construction by Marconi Radar Systems uses up 
to 4,000 transputers for signal processing of the radar re¬ 
turns. 1 The transputers combine to form a very fast parallel 
computer that accepts digitized radar returns at its input 
and defines the range, bearing, and height of every target 
at its output. (Some environmental information is also added 
to the output.) The processing load amounts to roughly 
three billion operations per second through each radar 
channel. 

The alternative to transputers—hardwiring—is inherently 
expensive (previous Martello systems used about 50 dif- 



Figure A. Gypsy hand-held receiver. 

ferent board types). With radar technology now moving 
so fast, hardwiring is doubly expensive because boards 
are quickly rendered out of date. By contrast, in the 
transputer system, software performs the entire signal pro¬ 
cessing task, and the same basic computer remains usable 
throughout a whole generation of radars. 

The signal processing algorithms take place in real-time 
by using pipeline parallelism to spread the parts of the 
algorithms across an array, with computation overlapping 
communications (see Figure B on the next page). Simula¬ 
tion devised the optimum map, making it necessary to 
build only a small part of an array to prove the concept. In 
an array node of 50 transputers, housekeeping uses two 
transputers, while the rest are available for data processing. 

continued on p. 18 
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Two sample applications 
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Figure B. Diagram of Martello radar system architecture. 

Geometric parallelism achieves the processing rate 
required for the Martello radar. The radar coverage breaks 
into range bands, and a particular array node directs all 
samples from a given band. Each array node executes 
an identical program, but on different data. The scalable 
performance of the radar is proportional to the number 
of array nodes. 


The vast number of processors in the Martello com¬ 
puter are 16-bit T222 transputers, with some 32-bit T425 
transputers. A follow-on project proposes to construct a 
multifunctional radar for the Italian navy using floating¬ 
point transputers. However, no fundamental change to 
the architecture is necessary to take advantage of new 
generations of transputers. 


project (see ESPRIT Supernode box). This project led to the 
development of many new commercial hardware and soft¬ 
ware products, including the Inmos T800 transputer, the 
Parsys Supernode machine, the Telmat T-node machine, 
and the N.A. Software Limited’s parallel processing librar¬ 
ies for Fortran numerical routines. More significantly, the 


project contributed greatly in establishing the need for 
reconfigurable systems and in creating paradigms for par¬ 
allel programming. 

The personal workstation also opened up another mar¬ 
ket for the transputer, since it provides the basis for work¬ 
station accelerators. More than 30 companies now market 
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The ESPRIT Supernode Project 

Over about a four-year period (Dec. 1985-Nov. 1989), 
the Supernode Project aimed to create a low-cost, high- 
performance computer system from transputers. This 
project incorporated the development of the T800 
floating-point transputer 2 and the Supernode RTP 
(Reconfigurable Transputer Processor). 

One Supernode consists of 16 transputers connected 
via a crossbar switch (also developed during the project). 
Using crossbar switches, multiple Supernodes combine 
to provide systems of up to about 1,000 transputers with 
no limitations on topology (except those implied by the 
four links on each transputer). Constructing larger sys¬ 
tems remains possible with minor constraints on topology. 

The wide range of applications demonstrated on the 
Supernode included finite-element analysis, logic simu¬ 
lation, luminosity simulation, and real-time image analy¬ 
sis. Development of parallel algorithms and parallel 
numerical libraries also occurred. 

The main results demonstrated the feasibility of the 
system, and the surprising ease in programming appli¬ 
cations to take advantage of concurrency. A large num¬ 
ber of European research projects base their development 
of parallel computing techniques on the now commer¬ 
cially exploited Supernode. 


accelerator cards, offering a range of hardware and soft¬ 
ware capabilities. Initially these cards targeted the develop¬ 
ing transputer embedded systems market. However, as 
application software for the transputer grew, the market¬ 
place widened—first to engineering design workstations and 
more recently to commercial and financial workstations. 
Several workstations based entirely on transputers are now 
available. 

ESPRIT projects continue to attack the difficult problems 
of providing very high performance systems (see box for 
current projects). The original issues concerned the feasibil¬ 
ity of constructing such systems (possibly containing thou¬ 
sands of processors) and determining the methods of 
programming individual applications. The Supernode project 
developed a machine that could effectively execute a wide 
range of applications, and in many cases nearly achieved 
the theoretical maximum performance from the machine. 

Current ESPRIT project issues involve “scalability,” “port¬ 
ability,” and programming ease. Scalability refers to the ease 
of achieving increased performance from an application by 
using more processors. Portability relates to the ease of 
transferring programs between parallel machines of differ¬ 
ent architectures. The term “general-purpose” summarizes 
these issues of parallel computing. A general-purpose par- 


Current ESPRIT projects 

The Supernode 2 project studies software issues— 
particularly those concerned with advanced operating 
systems—in the context of the Supernode machine. Such 
an operating system converts the machine from one that 
runs one application into a more general-purpose facility. 

The two-year PUMA (Parallel Universal Message¬ 
passing Architecture) project explores the possibilities 
offered by the new virtual communications architecture 
(see the “The next-generation transputer” section in the 
main article). Besides studying the performance offered 
by various topologies and the implications for computer 
architecture, the project examines high-level models of 
parallelism and their implementation using a virtual 
message-passing system. 

The three-year GPMIMD (General Purpose, Multiple- 
Instruction, Multiple-Data) project aims to develop a 
standard European MIMD architecture based on HI 
transputers and virtual communications. Four main Eu¬ 
ropean transputer-supercomputer manufacturers— 
Meiko, Parsys, Parsytec, and Telmat—currently work as 
key collaborators on the project, which is led by Inmos. 

The OMI MAP (Open Microsystems Initiative Micro¬ 
processor Architecture Project) forms the core of the 
European-led Open Microsystems Initiative. It aims to 
define the architecture of a new microprocessor family 
designed to exploit the VLSI capabilities anticipated in 
the latter half of the 1990s. Collaborators include Bull, 
Inmos, Olivetti, Siemens, and Thomson. 

Several other projects exploit the transputer architec¬ 
ture for specific application areas. For example, Padmavati 
(Parallel Associative Development Machine as a Vehicle 
for Artificial Intelligence) uses transputers and associa¬ 
tive memory for symbolic processing. 


allel computer executes a range of applications without 
concern for the number of processors employed or the to¬ 
pology in which they are connected. Indeed, whether the 
underlying hardware architecture is based on message passing 
(such as transputer-based and hypercube-style systems) or 
shared memory (such as Sequent’s Balance system or Cedar 
systems) is of no concern to the programmer. 

The expertise derived from ESPRIT and Supemode projects 
continues to spin off other applications. Migration of pro¬ 
cessing techniques and applications from general-purpose 
computers into special-purpose systems signifies a definite 
computing trend. For example, high-performance laser printers 
incorporating Postscript processing and intelligent terminals 

continued on p. 76 
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The European Declarative System, 
Database, and Languages 
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The EP2025 EDS project develops a highly parallel information server that supports established 
high-value interfaces. We describe the motivation for the project ; the architecture of the sys¬ 
tem, and the design and application of its database and language subsystems. 


B n 1988 Bull, ICL, Siemens, and their 
jointly owned European Computer 
Research Centre (ECRC) identified a 
common interest in supporting the 
processing of future intensive applications. The 
four partners defined a European Declarative 
System proposal, which the European Commis¬ 
sion supported as project EP2025, a part of the 
European Economic Community’s ESPRIT pro¬ 
gram. The EDS project began in 1989 and ex¬ 
tends until 1992 with phases of definition, 
component development, and system integration. 

EDS machines primarily function as informa¬ 
tion servers to manage all varieties of informa¬ 
tion intelligently. They will support languages and 
interfaces of value that are already established 
and in use: Unix, extended SQL, Lisp, C++, and 
the ECRC Elipsys parallel logic programming 
language. 

The following analysis of the requirement for 
information servers motivated the EDS project and 
justifies this role for EDS machines. 

Information server requirement 

Enterprises in the industrial, service, govern¬ 
ment, administrative, and defense sectors use in¬ 
formation technology today. They depend 
increasingly on their information resource to: 

• reduce operational costs, 

• improve effectiveness from stock holding to 
customer service, 


• support business development in new mar¬ 
kets, and 

• create a lasting competitive advantage in a 
rapidly changing world. 

These enterprises often regard their information 
as more important than their next product or 
service. As a result, they pursue a systems archi¬ 
tecture that delivers highly reliable information 
technology support and comprises a: 

• complete, coherent, and robust information 
base; 

• portfolio of applications interworking 
through data; and 

• framework of long-life interfaces protecting 
their investments. 

Large corporate systems increasingly play the 
role of database or information servers; servicing 
the SQL interface today requires some 75-80 
percent of the processor cycles. Many factors 
increase the load on information servers, a fact 
likely to require the development of systems with 
highly parallel architecture to meet the future 
demand. 1 

Information resource. We’ve deliberately 
chosen the word information to be an umbrella 
term for the complete knowledge spectrum. We 
see this spectrum ranging from conventional for¬ 
matted data to less structured text, representations 
of sound and image, and higher order knowledge 
in the forms, for example, of constraints, integrity 
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rules, business rules, and processes. Knowledge in its broad¬ 
est sense, of course, includes facts and analysis, certainty, 
rumor, and speculation. 

The volume of such information reportedly grows at some 
25-30 percent yearly, a rate that we expect to be sustained by 
increased interest in text and image. Since this information is 
so valuable, we hope to store it with security levels that would 
be the envy of any bank. These levels suggest a large, central 
facility rather than a set of all too portable personal computer 
disks. 

Information access. Only on-line systems support the 
effectiveness needed in our budget-conscious, competitive 
world. Literate infonnation workers, desktop technology, CASE 
(computer-aided software engineering) tools, and business- 
to-business systems increase the volume of on-line transactions 
at some 20-30 percent a year. In addition, responses must 
come in a suitably short time, regardless of information vol¬ 
umes, transaction rates, or the incompleteness of the data 
input to specify the query. 

We can characterize transactions in terms of frequency and 
complexity, as judged by the load they place on the computer 
system. Classic transaction processing occurs at a high rate 
with low complexity, while knowledge-based systems are 
highly complex and process at a low rate. Any computer 
system has a finite throughput capacity, shown by the per¬ 
formance frontier (see Figure 1). 



Transaction complexity 


Figure 1. The range of transaction types. 

Some evidence shows that performance problems impede 
the exploitation of “fifth-generation” knowledge-manipula¬ 
tion techniques. However, complex queries over knowledge 
bases do exist in several areas. These areas include govern¬ 


ment administration, CAD (computer-aided design) systems 
including software engineering, the storage and scheduling 
system of distribution industries, and the remote maintenance 
systems of large utilities. 

EDS technology intercept 

The EDS project aims to advance the information server 
performance frontier in the fastest way. We plan to do so by 
intercepting key hardware and software technologies and 
integrating them behind established interfaces of high value 
to prospective customers. 

The ANSI/ISO SQL standard 2 is the key interface today 
between the application and the database manager. This query 
language always allows the user to ask for a set of records, 
most likely chosen from a large database. In principle, a mil¬ 
lion processors could simultaneously assess one each of a 
million records to service an SQL query with obvious benefits 
to the response time of the query. 

Opinions differ as to whether SQL will evolve sufficiently 
to meet the new requirements for managing more complex 
data types and manipulating knowledge. We believe the cur¬ 
rent investment in SQL will guarantee SQL a long life. We 
therefore proposed an extension of SQL, ESQL, to meet fu¬ 
ture requirements for more comprehensive databases. 3 

On the hardware side, the most rapid change is occurring 
in microprocessors, which continue to increase in raw power 
at some 50 percent each year. Today, microprocessors prom¬ 
ise 25 MIPS (million instructions per second); tomorrow, 40, 
60, and 100 MIPS. The challenge for the computer system 
architect is to achieve a similar increase in total systems 
throughput. 

In addition, storage technologies are diversifying; large- 
scale RAM storage is often the most cost-effective way to 
improve total systems performance and the price/performance 
ratio. We expect to see today’s commodity, 4-Mbit dynamic 
RAM chip succeeded by the 16-Mbit chip in 1994 and the 64- 
Mbit chip in 1998. DRAM cost per byte now drops at 60 
percent a year, and in 1995 we expect DRAM storage to be 
only 20 times the cost of magnetic-disk storage. 

The EDS machine therefore exploits microprocessors and 
large DRAM storage, supported by a communications infra¬ 
structure of suitable responsiveness and bandwidth. It avoids 
the bottleneck of a single path from processing power to 
storage by adopting a distributed “share-nothing” architec¬ 
ture. This architecture offers linear performance returns when 
the number of processors increases into the hundreds. 

The EDS system 

The static and simplified view of the EDS system seen in 
Figure 2 on the next page identifies the main interfaces and 
components. We designed the system to comprise a parallel 
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Figure 2. EDS system architecture. 

processing machine and Emex kernel supporting Unix, ex¬ 
tended SQL, Lisp, and Elipsys subsystems. The system will 
attach as an accelerator to a variety of Unix and proprietary 
hosts and be configurable up to 256 processors, each with 
up to 64 Mbytes of storage. We predict the following per¬ 
formance figures: 

• Database processing. Meets the simple line-of-business 
Transaction Processing Council A benchmark performing 
12,000 transactions per second at 30 percent utilization. 

• Lisp. Meets the Boyer benchmark performing 140 Boyer 
runs per second. 

• Elipsys. 32 MLIPS (million logical inferences per second) 
on average. 

The EDS hardware. The EDS parallel machine 4 consists of 
a message-passing network, which provides a number of 
identical connection ports for attaching various functional 
elements. We envisage four types of elements: processing, 
diagnostic, input and output, and host connection (Figure 3). 



Figure 3. EDS hardware. 


The processing element to be implemented for the proto¬ 
types consists of (see Figure 4): 

• a main processing unit, a high-performance Sparc RISC 
(reduced instruction-set computer) with matching cache 
and memory management unit; 

• a system support unit to offload the most critical paral¬ 
lelism primitives from the main processor; 

• a network interface unit providing buffering and data 
transfer; and 

• a local storage unit holding a maximum of 64 Mbytes of 
data. 


Network DC I 



Figure 4. EDS processing element. 

We expect later production versions of EDS to exploit the 
16-Mbit chip and support 4 Gbytes of nonvolatile memory 
per processing element. We plan to simulate the effect of the 
well-understood nonvolatile storage during the project. 

We designed the processing element to support efficiently 
the parallel operations of the execution models of the kernel, 
the database system, and the language systems. In addition 
to executing instructions within a normal sequential thread 
of computation, the processing element must support basic 
kernel operations such as passing a message. The primitive 
machine interface, or PMI, shown earlier in Figure 2, pro¬ 
vides specific operations to support kernel functionality. PMI 
also introduces the required independence between the ker¬ 
nel software and the parallel machine hardware. 

The EDS kernel and PCL. The EDS Process Control Lan¬ 
guage is the common interface through which all subsystems 
exploit and provide guidance to the parallelism features of 
the machine. 

The concepts upon which PCL is based closely relate to 
those in Unix and in existing kernel interfaces such as Chorus 
Systeme’s Chorus and Carnegie Mellon’s Mach, both of which 
are designed to manage distributed systems. These concepts 
include virtual memory, processes, and interprocess com¬ 
munication. PCL develops these concepts to provide the 
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functionality and performance levels that a large-scale paral¬ 
lel system like EDS offers. 

We particularly developed certain features of PCL in EDS: 

• a multilevel process-context model with very lightweight 
threads, 

• a storage model providing considerable flexibility in the 
sharing and management of virtual memory in a distrib¬ 
uted system, 

• efficient and reliable message passing, 

• an exception-handling mechanism based on the message¬ 
passing scheme, and 

• flexible scheduling and load balancing for a highly par¬ 
allel system. 

The inclusion within the EDS architecture of a common 
kernel and PCL interface brings a number of benefits: stan¬ 
dard control of the machine, the exploitation of parallelism, 
and the use of system resources. 


The EDS database system 

The main exploitation focus of the EDS project is the de¬ 
velopment of an advanced database server. The server pro¬ 
vide an order-of-magnitude performance improvement over 
mainframes and advanced functionality to extend the range 
of applications it supports. 

The improved functionality will include facilities for: 

• support of user-defined data types and methods, 

• support of complex objects and large objects, 

• deductive database capabilities, 

• general integrity constraints, and 

• triggers (actions to be carried out when a given event 
occurs). 

These features will not only extend the range of applica¬ 
tions that can be supported efficiently and naturally but will 
also increase programmer productivity currently supported 
by standard relational database systems. 

To achieve these objectives, we use a number of design 
strategies: 

• exploitation of the parallelism available in the base EDS 
system; 

• exploitation of large, stable RAMs to hold the persistent 
data over time and across system breaks; 

• a database system based on standard relational database 
technology that is extended to provide object-oriented 
database and deductive database facilities; 

• an interface, ESQL, 3 which is an extension of SQL. (The 
language provides a rich and extensible type system 
based on ADTs, or abstract data types, in which the 


methods can be defined in various programming lan¬ 
guages. It also provides complex objects with object 
sharing by combining the ADTs with object identity, and 
a Datalog-like deductive capability); 

• database queries compiled into native machine instruc¬ 
tions wherever possible; and 

• an optimizer designed to be extensible to allow the sys¬ 
tem to evolve. 
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Figure 5. The logical structure of the database system. 

Database system architecture. The database system splits 
into three main components, as shown in Figure 5. The Re¬ 
quest Manager compiles database commands into a native 
machine code, the Data Manager provides the runtime facili- 
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Architectures Within the 
ESPRIT SPAN Project 


The SPAN project pooled the resources of numerous researchers in several countries to 
integrate symbolic and numeric computing on parallel systems. The resulting Kernel System 
architecture provided a central model for which two programming languages and two 
parallel-system architectures were developed. This article sums up the project and discusses 
its advancements. 


Peter Rounce 

University College 
London 

Jose Delgado 

Instituto de Engenharia 
de Sistemas e 
Computadores 


E SPRIT Project 1588, or SPAN (Sym¬ 
bolic Processing and Numeric), con¬ 
cerns the integration of numeric and 
symbolic computing on parallel ar¬ 
chitectures. The participants in this project are 
the Computer Technology Institute (Patras, 
Greece), the Instituto de Engenharia de Sistemas 
e Computadores (Lisbon), PCS Computer Systeme 
GmbH (Munich), Thomson-CSF Cimsa Sintra 
Division (Paris), Thom-EMI Central Research Labs 
(London), the University of Athens, and the 
Department of Computer Science at University 
College London. 

SPAN activities ranged from application soft¬ 
ware to VLSI (very large scale integration) hard¬ 
ware design and manufacture. The major goal of 
SPAN was to investigate symbolic and numeric 
integration on parallel computers at all levels: 
application, language, system, and architecture. 
This article briefly overviews the project before 
concentrating on its architectural research. 


The project 

A three-year period of research completed 
January 1990 included: 

• identifying parallelism in particularly com¬ 
plex applications and developing programs 
to demonstrate the use of this parallelism 
with parallel architectures, 


• working at the language level to extend ex¬ 
isting languages with parallel and other 
constructs to allow the development of pro¬ 
grams for concurrent execution, 

• working at the software system level to en¬ 
able the easy mapping of concurrent pro¬ 
grams onto existing machines, and 

• working on novel architectures to provide 
increased hardware support. 

We were aware of previous difficulties in pro¬ 
gramming parallel systems, porting existing soft¬ 
ware to them, exploiting the power of the systems, 
and extracting the full parallelism of the applica¬ 
tion. We wanted to develop an architectural model 
that would port to a number of parallel systems. 
This model would provide a fixed target for ap¬ 
plications and languages, enabling easy porting 
of these elements to different hardware. 

Most important, we wanted to investigate the 
integration of symbolic and numeric computing. 
Each type of computing is very powerful in its 
particular realm of application. Symbolic lan¬ 
guages provide for concise and precise programs, 
by which complex programming tasks can often 
be simply expressed and proved formally correct. 
However, these languages are often slow to ex¬ 
ecute because their operational model does not 
map well to standard architectures. 

Numeric languages, conversely, match standard 
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architectures particularly well and provide for fast execution 
and easy system control. The integration of symbolic and 
numeric computing in one environment promises to provide 
the powers of both with consequential results in programming 
capabilities. 

We based the SPAN project on one architectural model of 
parallel computing. This model, called the Kernel System, is 
a generalized, abstract architecture. It directly supports both 
symbolic and numeric computing, as well as parallel pro¬ 
cessing. The Kernel System was the central component of 
the project to which all participants contributed, although 
Thom-EMI and University College London had direct re¬ 
sponsibility for the design and development. The Kernel 
System served as a target for the language and applications 
and facilitated their porting to standard architectures. 

We started with two instantiations of the model. 1 Thom-EMI 
was the primary developer of a high-level language called 
Parle (Parallel Architectures and Languages Europe). 2 UCL 
developed a low-level language called the Virtual Machine 
Code (VMC). 3 

We directed much of the work outside of developing the 
Kernel System toward introducing or applying parallelism to 
existing languages and applications: 

• Thomson-CSF investigated Prolog and extended it with 
parallel constructs and numeric capabilities. This par¬ 
ticipant also developed a parallel numeric-symbolic 
system for image interpretation. 

• PCS extended Lisp with parallel constructs. 

• The Computer Technology Institute developed an expert 
system for the solution of partial differential equations, 
involving both symbolic and numerical processing in a 
single application. The investigations included how to 
identify and apply parallelism in the solution process. 

• The University of Athens investigated the introduction 
of parallelism into a real-time database management 
system. 

• Thom-EMI designed a real-time expert system suitable 
for a parallel architecture. 

• UCL developed an object-oriented environment for in¬ 
tegrating symbolic and numeric programming. 

The language work targeted Parle, the high-level interface 
to the Kernel System, while the applications targeted some 
subset of Prolog, Lisp, or Parle directly. 

An associated activity concerned the development of an 
object-oriented programming environment. The principle 
aim was to investigate and develop an object-oriented 
framework for integrating heterogeneous hardware and 
software systems. Thus, the environment must provide for 
the production of programs that had components written 
in different languages. The environment must facilitate the 
exchange of data between the components. It would map 


these programs onto diverse hardware systems—particularly 
parallel systems—and manage the execution of these pro¬ 
grams. To some degree, this environment provided an al¬ 
ternative route to the integration of symbolic and numeric 
programming. 4 

Researchers at UCL developed such an environment, called 
Coside (originally called Solve). Further work is in progress. 5 
Coside is a flexible system that can be targeted (more or less 
efficiently) at a large variety of systems and/or languages. 
The Kernel System is among them, and we expect it to pro¬ 
vide efficient support for the environment. 

We targeted a number of systems for the Kernel System at 
the architectural level. These included Supernode, developed 
by Thom-EMI in ESPRIT Project 1058, and Padmavati (Parallel 
Associative Development Machine as a Vehicle for Artificial 
Intelligence), developed by Thomson-CSF in ESPRIT Project 
967. The SPAN project produced three architectures: Sprint, 
DICE (Distributed Interconnected Computing Elements), and 
the D-machine. The objective of UCL’s Sprint was to develop 
a parallel architecture to support the Kernel System efficiently. 
Sprint contains custom VLSI components and uses the VMC 
as the architectural model. (Figure 1 illustrates the relationship 
between projects within SPAN.) 
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DICE is an object-oriented, parallel architecture developed 
by the Instituto de Engenharia de Sistemas e Computadores. 
It was initially targeted to be am on Inmos transputers. Inesc 
designed custom VLSI components to directly support the 
architecture. 

The D-machine—a traditional bus-based multiprocessor 
developed from off-the-shelf components—uses Motorola 
68030 processors. Its designer, PCS, also developed a real¬ 
time operating system for the D-machine. We present Sprint 
and DICE in greater detail later. 

Kernel System 

This system provides an architecture that supports both 
symbolic and numeric computing with equal facility. 1 The 
Kernel System also provides multiple-instruction, multiple- 
data (MIMD) parallelism. The system contains key concepts 
taken from existing architectures for symbolic and numeric 
machines in an attempt to capture the best of both worlds. A 
key philosophy limited unnecessary complexity, providing 
only the essential primitives on which to develop complexity. 

Although it is an abstract machine model, the Kernel Sys¬ 
tem embodies a particular form of parallel architecture. Each 
processor has local, but not private, memory. The local 
memories form the logical, globally shared memory of the 
architecture. Any processor can communicate with any other 
processor via operations on the local memory of the remote 
processor, although the communications method is not 
specified. This approach provides a very general, shared- 
memory, MIMD architecture with point-to-point communi¬ 
cations. The architecture can be mapped onto other styles, 
even though the communication costs might be greater and 
nonuniform in a real architecture. 

In addition to shared memory, the system provides message¬ 
passing. A processor can use this procedure to access both 
its local memory and the nonlocal memories attached to other 
processors. The shared memory operations consist of the load 
and store functions common to many architectures in which 
a copy of a memory element is taken during a load operation 
and the current content of a memory element is overwritten 
upon a store. 

For message passing, a memory element must have both 
“nonempty” and “empty” states. This arrangement allows a 
get operation to identify and remove the content of a 
nonempty element, leaving it empty. A put operation only 
writes a value into an empty element, making it nonempty. 
If the condition of an operation is not met, the system blocks 
operation and suspends the executing process until the 
condition is met. 

The two forms of operation provide the familiar asyn¬ 
chronous communication via shared memory and synchro¬ 
nous message-passing in a style similar to the Occam 
programming language. Synchronous message-passing is es¬ 
sential in a parallel processing environment, while shared 


memory is a proven and powerful method. 

Key features in the Kernel System support the integration 
of symbolic and numeric processing: 

• the memory structure (through the provision of a memory 
element suitable for both styles of processing), and 

• memory access mechanisms and data operators appro¬ 
priate to both styles. 

The list-structured memory of the Kernel System holds 
memory elements that are flexible in what they can contain. 
This memory structure can hold an integer or a list, or it can 
be empty. Lists of unlimited length contain all memory ele¬ 
ments. Figure 2 shows a possible arrangement of the memory 
that is local to the processor. The root pointer gp points to 
the top-level list of the memory. This top-level list holds 
seven memory elements, three of which contain further lists. 
(The arrows in the figure indicate the lists held by particular 
memory elements.) Any list may contain other lists. Thus, 
the memory is tree structured and its depth is unlimited. 
The text [gp 4 4 3 2] in Figure 2 contains the address of the 
element to which the associated arrow points (see the Vir¬ 
tual Machine Code section in this article for further detail). 



[gp 4 4 3 2] 


Figure 2. Example of list-structured memory. 

Lists are primitive data like integers. All memory functions 
operate on lists in local and nonlocal memory. The system 
provides operations to create, extend, truncate, or select from 
lists. Thus, the dynamic memory structure changes as lists 
are moved, copied, and modified. 

The system must randomly access the lists to support nu¬ 
meric operations. Any element of a list can be directly accessed 
with the same overhead—no matter what position the ele¬ 
ment occupies. This approach provides the access mecha¬ 
nism common to numeric processing. The arrays and 
structures/records of numeric processing are naturally mapped 
into lists. Having lists as primitive objects—and providing 
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operators on the lists—supports symbolic processing. 

The fact that the same lists are used for both styles of 
processing enables the integration of the two styles. The sys¬ 
tem can create a list in a symbolic way (like the cons operator 
of Lisp) and then process it in a numeric way. The list is then 
dismembered in a symbolic style (like the car and cdr opera¬ 
tors of Lisp). 

The processing model of the Kernel System incorporates 
both multiprocessing on individual processors and parallel 
processing on a set of processors. Each processor has its 
own code, which is stored in the list-structured, von Neumann 
memory. The execution model has procedural semantics with 
explicit flow of control. Each processor has a set of tasks or 
processes, which are scheduled in some arbitrary fashion. 

Parle 

This high-level, procedural language follows the Algol style. 
It is a realization of the Kernel System. A key object in its 
design was supporting parallelism at both the processor and 
statement levels. Researchers initially envisaged Parle as a 
compiler target language. However, a number of the applica¬ 
tions projects used Parle as a parallel programming language 
because it was much more developed at the beginning of the 
SPAN project than the other languages under investigation. 
This fact complicated the design and development of Parle, 
since in some instances the two goals were in conflict. 

A Parle program consists of a set of data definitions and a 
set of processor definitions. The latter defines the code and 
associated local data to be loaded onto a processor in the 
target parallel system. Parle possesses a global memory that 
is accessed by name. Variables defined within a processor 
definition are local; others are global. Procedures, functions, 
variables, and processor definitions have names that allow 
them to be accessed and manipulated like other data. Since 
they are all implemented as lists, one would expect this 
capability. 

The global name scheme provides for interprocessor 
communications. Parle can nest procedures, functions, and 
data inside other procedures and functions. Since Parle is a 
scoped language in which the scope of names is defined 
statically by the code definition, the system can hide data 
and methods. 

A processor definition consists of local data, procedures 
and functions, and a code statement or statements. Some 
code is necessary for execution when the system starts up. 
All processors start up when code is loaded. This is the simplest 
and most general mechanism on which other startup 
mechanisms can be built. The mapping of code into proces¬ 
sor definitions provides for the allocation of code to proces¬ 
sors. The language provides replicators so that a processor 
definition can be mapped onto any number of processors. 

Parle has operators for creating, joining, splitting, and se¬ 
lecting from lists: 


1) a = [ 1, 2, 3, 4] 

2) b := # a 

3) d := a ++ c 

4) e := a « 2 

5 ')f: m c«n 
6)f\ 9:= a l 3 


Create a list in variable a 
Take the length of the list in a 
Join-concatenate two lists 
Form sublist of a from element 1 
to and including element 2 
Form sublist of c after element n 
to end 

Copy element 3 of a into element 
9 of / 


Because Parle is loosely typed, a variable (memory ele¬ 
ment) can hold any data type—Boolean, integer, real, list, or 
empty. Therefore, the variables on the left-hand sides of the 
examples are untyped and accept whatever the right-hand 
sides produce. However, variables a and c in the last five 
examples must hold lists, since the length, join, split, and 
select operators must work on lists. Thus, the loose typing 
leads to a requirement for runtime type checking to determine 
the type of the contents of a variable. 

The memory of Parle is list structured in the same way as 
the Kernel System. Copy semantics function as follows. Load 
operations always take a copy of the addressed memory, 
even if this means copying a list. In examples 3 through 3 
above, the lists in a and c are unchanged by the operation. 
Lists can be deleted by assignment: 


/:= 1 5, 99, [ 1, 4, 6 ], a ] Create a list in/ 

/:= 9 Delete what’s in / 

These assignment statements demonstrate the shared 
memory operation of Parle. The variables may be local or 
global. Parle also provides for message-passing operations: 

a := b? Take a nonempty value from b, leaving b 

empty 

c? := a Write into c only when it is empty 

dl := a ? Perform single-buffered communication 

During these operations, the get operation leaves the read 
variable in the empty state. These operators block further 
processing, which only continues when the operations have 
succeeded in getting or putting a value. 

Processor definitions allow for coarse-grain parallelism to 
be specified. Parle also supports fine-grained parallelism by 
providing operators for parallel execution at the statement 
level: 

i) Sj I I s 2 I I ... I I s n 

ii) S, ; S 2 ; ... ; S n 

iii) for {i: 1 I I n ) Sj rof 

iv) for {i: 1 ; n } Sj rof 


continued on p. 88 
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Pygmalion: 

ESPRIT II Project 2059, Neurocomputing 


Pygmalion aims to promote the European industry's application of neural networks and de¬ 
velop "standard" computational tools for their programming and simulation. A complete envi¬ 
ronment for developing algorithms and applications will demonstrate the network capabilities 
expected from their properties of massive parallelism, fault tolerance, adaptivity, and learning. 


Bernard Angeniol 

Mimetics 


B n the last five years, we’ve seen a 
dramatic explosion of interest in 
neural computing that covered neu¬ 
ral applications, models, program¬ 
ming environments, and neurocomputers. The 
Pygmalion project aims to promote the applica¬ 
tion of neural networks by European industry and 
to develop European “standard” computational 
tools for the programming and simulation of 
neural networks. 

Tools for neural networks center on a pro¬ 
gramming environment comprising five major 
parts. They are a graphic monitor for controlling 
and monitoring a network simulation, an algo¬ 
rithm library of common neural networks, the 
high-level neural programming language N, the 
intermediate-level network specification language 
NC, and compilers for the target machines. 

Pygmalion also addresses the implementation 
of neural algorithms by wafer scale integration 
techniques. In fact, a VLSI (very large scale inte¬ 
gration) demonstrator for such a technology is 
already available. This development is the first 
step toward the building of a European general- 
purpose, highly parallel neurocomputer and the 
production of application-dedicated neurochips. 
These chips will be the goals of the future ESPRIT 
II project called Galatea. 

Pygmalion applications span the fields of im¬ 
age processing, speech processing, and acoustic 
signals. We selected key real-world applications 


in image processing and speech processing, and 
a small application in acoustic signals, to demon¬ 
strate the potential of neural networks to various 
industrial problems. 

In image processing we investigated two im¬ 
portant application domains, remote data sensing 
and factory inspection. Remote sensing includes 
pattern recognition and interpretation of Spot 
images on the earth’s surface, such as road traf¬ 
fic, fields, and various kinds of grounds. Factory 
inspection covers the recognition and classifica¬ 
tion of workpieces in a factory automation con¬ 
text. These workpieces handle normal problems 
relating to position, overlap, and orientation, 
under different lighting conditions. 

In speech processing, we planned to lay the 
foundations for an automatic speech recognition 
system by developing efficient learning algorithms 
for the basic building blocks. These basics include: 

• isolated word recognition for small and 
medium-size vocabularies; 

• speaker independence and adaptivity; 

• speech preprocessing, including noise 
reduction; 

• isolated word recognition in noisy environ¬ 
ments; and 

• subword-unit recognition and coarticulation. 

Acoustic signal classification of underwater 
natural sounds applies neural computing in two 
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Table 1. Pygmalion research groups. 


Partner 

Laboratory 

Role 

Thomson-CSF 

Division Systemes Electroniques, Paris 

Prime contractor 

Image processing 

Acoustic signal processing 

N high-level language 

INPG 

Grenoble 

VLSI neurocomputer chips 

CSELT 

Centro Studi e Lab. Telecomunicazioni, Torino 

Isolated word recognition (IWR) 

Philips 

Lab. d'Electronique et de Physique appliquee, Paris 

Image processing 

SEL 

SEL Research Centre, Stuttgart 

Speaker-adaptive IWR 

CTI 

Computer Technology Institute, Patras 

Cellular automata tools 

3D pattern recognition 

ENS 

Ecole Normale Superieure, Paris 

Image pattern recognition 

INESC 

Instituto de Engenharia Sistemas e Computadores, Lisboa 

Algorithm library 

IRIAC 

Universite Paris Sud, Orsay 

Algorithm library 

Low-level speech processing 

UCL 

University College London 

Graphic monitor 

NC intermediate-level language 

UPM 

Universidad Politecnica de Madrid 

Speech processing 


different ways. It uses a processed version of the signal, ob¬ 
tained through classical preprocessing algorithms, as input to 
the neural classifier. It directly applies neural classification to 
the raw signal. 

The Pygmalion project brings together many of the lead¬ 
ing neural computing research groups from European indus¬ 
try, research institutes, and universities (see Table 1). 

NNPS 

A major goal of Pygmalion is to ensure the widest usage of 
the neural programming environment by making it as flex¬ 
ible and portable as possible. The major parts of the Neural 
Network Programming System include the: 

• Graphics monitor. The graphical software environment 
for controlling execution and monitoring of a neural 
network application simulation includes a simulation 
command language. This language sets up a simulation, 
monitors its execution, interactively changes values, and 
saves a trained network. 

• Algorithm library. The parameterized library of common 
neural networks, written in the N language, provides 
users with a number of validated modules for construct¬ 
ing applications. 

• High-level language N. This object-oriented program¬ 
ming language defines, in conjunction with the algo¬ 
rithm library, a neural network algorithm and application, 
by describing the network topology and its dynamics. 

• Intermediate-level language NC. The low-level, machine- 


independent network specification language represents 
the partially trained neural network applications, a for¬ 
mat analogous to P code for Pascal systems. 

• Compilers. Language is compiled for the target Unix- 
based workstations and parallel transputer-based 
machines. 

Figure 1 illustrates this structure of the neural program¬ 
ming environment. 



Figure 1. Pygmalion neural programming environment. 

To ensure uniformity and consistency, the graphics moni¬ 
tor, the algorithm library, and the N and NC languages sup¬ 
port a common interface with the following properties: 
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1) All components present the view of a neural system based 
on a hierarchical structure of networks, layers, clusters, 
neurons, and synapses. 

2) All algorithms are parameterized. 

3) All algorithms and applications share a common reper¬ 
toire of data structures, function names, and system 
variables. 

4) The specification of an algorithm is separated from an 
application. 

The resulting uniform interface permits the development 
of generic applications and algorithms; one application may 
use many algorithms, and one parameterized algorithm 
may be configured for many applications. 

Graphics monitor. This monitor controls a neural net¬ 
work simulation. It executes on a host computer, generating 
a specific net simulator/emulator to be executed on a target 
computer. The host, through X Windows graphics tools and 
the command language, monitors the simulation/emulation. 

The windows environment provides pull-down menus to 
select and change the I/O format, network architecture, net¬ 
work learning algorithm, network training and execution, and 
displays of activations, weights, and so on. 


Users initially specify a neural network in the N language 
using modules from the algorithm library. They then trans¬ 
late the program description into the NC language. Once 
configured and specified in NC, users can train or use the 
neural network via the monitor. The monitor lets them dy¬ 
namically modify the network, translate it to a particular tar¬ 
get machine, or save the trained or partially trained network 
for later usage, possibly on a different target computer. 

Algorithm library. The library contains the classic neural 
network algorithms in a parameterized specification that can 
be configured for a specific user application. It includes the 
popular algorithms of: 

• Gradient Back Propagation with or without feedback, 

• Hopfield, 

• Boltzmann Machine, 

• Simulated annealing, 

• Competitive learning, 

• Adaptive resonance theory, 

• Linear associative memory, and 

• Kanerva memory. 


Pygmalion algorithm library 



Figure 2. Pygmalion algorithm library. 


These network algorithms already 
specify the interconnection geometry and 
the transfer equations. However, users 
select the number of processing ele¬ 
ments, their initial state and weight val¬ 
ues, learning rates, and time constants. 
The library divides into five main parts 
(see Figure 2): 

• an algorithm-independent section 
that contains the network compu¬ 
tation subroutines and support 
routines for memory and error 
management; 

• tools library routines for data file 
I/O subroutines, network architec¬ 
ture specification, and performance 
measurement; 

• algorithm modules, specifying the 
parameterized algorithms; 

• algorithm evaluation programs for 
testing the algorithm modules; and 

• the application library, which pro¬ 
vides the user-tailored application 
modules. 

We are constructing the algorithm li¬ 
brary in two stages. Initially, we imple¬ 
mented a C version of the library, for 
use by the image and speech process- 


30 IEEE Micro 






































ing applications in the project. Now, we are implementing 
the library using the N language, first with a reduced number 
of algorithms. 

The N language. Both expert and naive users can use this 
high-level neural network programming language. They can 
develop neural algorithms and use operational algorithms in 
applications. 

N’s syntax is a subset of C++ with additional neural-oriented 
features. It allows the description of algorithms 

1) by defining specific types having their own data and 
behavior in analogy to a class in C++ and 

2) by assembling them in a modular tree hierarchy. 

A library containing predefined types and parameterized 
algorithms (such as Hopfield, Back Propagation) accompa¬ 
nies N. 

The library may also contain unprotected programming 
objects. Programmers manage this part of the library, which 
is used in a read-write mode. Storage of the source code of 
both kinds of library objects lets the objects be consulted and 
integrated easily into any N program. 

A typical N program consists of a list of type definitions. 
One type may be defined from previously defined ones. Fig¬ 
ure 3 displays the structure of a type definition. 

new type xxx ( parameter-list ) 
parameter declaration 
composite types declaration 
internal variables declaration 
internal function declaration 
above: 

inherited variables declarations 
plugin: 

communication fields (inputs) 
plugOUt: 

communication fields (outputs) 
connection: 

connections between communication fields 
public: 

inheritance variables declaration 
activation methods of the type being defined 


Figure 3. Structure of a typical N program. 

Thanks to the choice made for its syntax, any program in 
N can easily be translated into a C++ program and conse¬ 
quently algorithms can be simulated on a sequential com¬ 
puter such as a Sun or Apollo. Furthermore, programmers 
can translate any neural network structure and algorithm in 
N into an equivalent NC structure, thus generating the nC 
version of any N program. 


The N programming language provides three types of 
facilities in which programmers can: 

• convert an N model (that is, algorithm) into an abstract 
representation that will allow the semantic analysis and 
the link editing of several algorithms in the same 
application; 

• allow for the reuse of previously defined algorithms; 
and 

• use tools and criteria to translate N applications into NC 
network specifications. 

The NC language. This language acts as an intermediate- 
level, machine-independent representation for neural net¬ 
works. Programmers can translate a network, specified in 
NC, onto a variety of computers for training or use. After 
usage (training), programmers update the intermediate-level 
specification of the network. The trained network can be 
stored in a library, filed for later use, or mapped onto a differ¬ 
ent computer. Machine independence is considered the ma¬ 
jor feature of the Pygmalion NC language, and, to enhance 
this, we have made the language a small subset of the draft 
ISO standard C. 

The NC intermediate-level representation divides the neu¬ 
ral network information into four different domains: the net¬ 
work topology, the data of the system including neuron status 
and synaptic weights, the functions defining the processing 
in the network, and the control of the network activities. An 
example of a framework for a neural network description 
appears in Figure 4 on page 99. 

The topology information is basically described by defin¬ 
ing the system variables (such as NETS, LAYERS, CYCLES, 
LEARNING_RATE), using ^define commands, and by com¬ 
pleting the system and config structures. The system struc¬ 
ture defines the central hierarchical structure in terms of nets, 
layers, and clusters inside layers. The config structure speci¬ 
fies the number of elements inside the hierarchy, such as the 
number of layers inside each network, number of neurons 
inside each cluster, and so on. The system structure also stores 
data information, in terms of a neuron’s state and weights, as 
well as functional information in the form of rules. These 
rules relate to the functions that should be performed by a 
neuron, such as weight summation and weight update. 

Two methods provide control information. Some system 
function definitions configure, initialize, and train the net¬ 
works, while a list of calls to the system functions centrally 
control the whole system. 

Hardware integration 

Although we dedicated only a very small amount of money 
to the study of hardware integration in Pygmalion, we made 
some progress. We studied a decentralized architecture in- 

continued on p. 99 
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J 

James, David V. Multiplexed buses: The endian wars continue; M-M Jun 
90 9-21 

Johnson, Stephen C. Hot chips and soggy software; M-M Feb 90 23-26 

K 

Kadota, Hiroshi, see Kaneko, Katsuyuki, M-M Apr 90 26-38 
Kahaner, David K. Software Report—Assignment Japan; M-M Aug 90 
4-6 

Kahaner, David K. Software Report—The Pax parallel computer; M-M 
Oct 90 5-6,91-93 

Kahaner, David K. Software Report—Quality improvement; M-M Dec 
90 48-51 

Kaneko, Hiroaki, Nariko Suzuki, Hiroshi Wabuka, and Koji Maemura. 
Realizing the V80 and its system support functions; M-M Apr 90 
56-69 

Kaneko, Katsuyuki, Masaitsu Nakajima, Yasuhiro Nakakura, Junji 
Nishikawa, Ichiro Okabayashi, and Hiroshi Kadota. Processing 
element design for a parallel computer; M-M Apr 90 26-38 
Kawabe, Ushio, see Hatano, Yuji, M-M Apr 90 40-55 
Kirrmann, Hubert. Micro World—Cocom: The next wall to fall?; M-M 
Feb 90 4 

Kirrmann, Hubert. Micro World—Minitel: The French love affair with 
telematics; M-M Apr 90 88-90 

Kirrmann, Hubert. Micro World—Train control systems; M-M Aug 90 
79-80 

Kirrmann, Hubert. Micro World—Train buses; M-M Oct 90 79-80 
Kirrmann, Hubert. Micro World—Reunification and the East German 
electronics industry; M-M Dec 90 

Kitahara, Takeshi, and Taizo Satoh. The Gmicro/300 32-bit 
microprocessor; M-M Jun 90 68-75 
Klein, Michael F., see Brown, Emil W., M-M Feb 90 10-22 
Kronlage, Bill, see Darley, Merrick, M-M Jun 90 36-47 
Kumar, Krishna A., and Brian Petrasko. Designing a custom DSPcircuit 
using VHDL; M-M Oct 90 46-53 

L 

Lapsley, Philip D., see Bier, Jeffrey C., M-M Oct 90 28-45 
Ledbetter, William B., Jr., see Edenfield, Robin W., M-M Feb 90 66-78 
Ledbetter, William B., Jr., see Edenfield, Robin W., M-M Jun 90 22-35 
Lee, Edward A. Programmable DSPs: A brief overview; M-M Oct 90 
14-16 

Lee, Edward A., see Bier, Jeffrey C., M-M Oct 90 28-45 
Lee, K. H., K. S. Leung, and S. M. Cheang. A microprogrammable list 
processor for personal computers; M-M Aug 90 50-61 
Leung, K. S., see Lee, K. H., M-M Aug 90 50-61 
Leunig, Steve, see Hammer, Carsten, M-M Dec 90 20-23, 83-88 
Lu, Mi, see Sibai, F. N., M-M Aug 90 21-33 

Luu, J. Comments on ‘A comparison of RISC architectures’ by R. S. 
Piepho and W. S. Wu; M-M Apr 90 5 (Original paper, Aug 89 51-62) 

M 

Maemura, Koji, see Kaneko, Hiroaki, M-M Apr 90 56-69 
Mateosian, Richard. Review of The Fifth Generation Fallacy’ (Unger, 
J. M.; 1987); M-M Feb 90 7-8 

Mateosian, Richard. Review of ‘Structured Walkthroughs, 4th edn.’ 

(Yourdon, E.; 1989); M-M Apr 90 86-87 
Mateosian, Richard. Review of ‘The Matrix—Computer Networks and 
Conferencing Systems Worldwide' (Quaterman, J. S.; 1990); M-M 
Apr 90 87 

Mateosian, Richard. Review of ‘CASE—Using Software Development 
Tools’ (Fisher, A. S.; 1988); M-M Apr 90 87 
Mateosian, Richard. Review of ‘Computer Architecture—A 
Quantitative Approach’ (Patterson, D. A., and Hennessy, J. L.; 
1990); M-M Jun 90 5 

Mateosian, Richard. Review of ‘Mastering Technical Writing’ 
(Mancuso, J. C.; 1990); M-M Oct 90 76-77 
Mateosian, Richard. Review of ‘Cache and Memory Hierarchy 
Design—A Performance-Directed Approach’ (Przybylski, S. A.; 
1990); M-M Dec 90 46-41 


Mateosian, Richard. Review of ‘The Elements of Spreadsheet Style' 
(Nevison, J. M.; 1987); M-M Dec 9046-41 
Matsuda, Yoshio, see Hidaka, Hideto, M-M Apr 90 14-25 
McGarity, Ralph C., see Edenfield, Robin W., M-M Feb 90 66-78 
McGarity, Ralph C., see Edenfield, Robin W., M-M Jun 90 22-35 
McLeod, John, see Birman, Mark, M-M Feb 90 55-64 
Milenkovic, Milan. Microprocessor memory management units; M-M 
Apr 90 70-85 

Miller, Christine. Micro News—Silicon Glen: The European challenge; 
M-M Jun 901 

Mori, Hiroyuki, see Hatano, Yuji, M-M Apr 90 40-55 
Murata, David, see Brown, Emil W., M-M Feb 90 10-22 

N 

Nakajima, Masaitsu, see Kaneko, Katsuyuki, M-M Apr 90 26-38 
Nakakura, Yasuhiro, see Kaneko, Katsuyuki, M-M Apr 90 26-38 
Nishikawa, Junji. see Kaneko, Katsuyuki, M-M Apr 90 26-38 

O 

Odijk, Eddy, see America, Pierre, M-M Dec 90 12-15, 61-75 
Okabayashi, Ichiro, see Kaneko, Katsuyuki, M-M Apr 90 26-38 
Omnes, Jean-Francois, Guest Ed., Thierry Van der Pyl, and Philip 
Treleaven, Guest Eds. Parallel computing in Europe; M-M Dec 90 
8-10 

O'Reilly, Maureen P., see Bier, Jeffrey C., M-M Oct 90 28-45 

P 

Paterson, Tim. On second thought... (Ltr.); M-M Apr 90 5 
Pennello, Thomas J. Compiler challenges with RISCs; M-M Feb 90 
37-43 

Peterson, Wayne A., see Chassaing, Rulph, M-M Oct 90 54-62 
Petolino, Joseph, see Brown, Emil W., M-M Feb 90 10-22 
Petrasko, Brian, see Kumar, Krishna A., M-M Oct 90 46-53 
Pierson, Michael J., see Govers, Francis X., Ill, M-M Oct 90 73-75 
Priem, Curtis R. Developing the GX graphics accelerator architecture; 
M-M Feb 90 44-54 

Pulling, David, see Darley, Merrick, M-M Jun 90 36-47 

Q 

Quintana, Eric E., see Edenfield, Robin W., M-M Feb 90 66-78 
Quintana, Eric E., see Edenfield, Robin W., M-M Jun 90 22-35 

R 

Reeve, Mike, see Hammer, Carsten, M-M Dec 90 20-23, 83-88 
Regimbal, Denis, see Davis, Henry, M-M Oct 90 17-27 
Reininger, Russell A., see Edenfield, Robin W., M-M Feb 90 66-78 
Reininger, Russell A., see Edenfield, Robin W., M-M Jun 90 22-35 
Rounce, P. A., and J. Delgado. SPRINT and DICE: Architectures within 
the ESPRIT SPAN Project; M-M Dec 90 24-27, 88-97 
Rumsey, Michael, and John Sackett. An ASIC methodology for mixed 
analog-digital simulation; M-M Aug 90 34-40 

S 

Sackett, John, see Rumsey, Michael, M-M Aug 90 34-40 
Sakamura, Ken, Guest Ed., The current Japanese computer scene 
(Special issue intro.); M-M Apr 90 12 
Samuels, Allen, see Birman, Mark, M-M Feb 90 55-64 
Satoh, Taizo, see Kitahara, Takeshi, M-M Jun 90 68-75 
Sibai, F. N., K. L. Watson, and Mi Lu. A parallel unification machine; 
M-M Aug 90 21-33 

Sih, Gilbert C., see Bier, Jeffrey C., M-M Oct 90 28-45 
Sijstermans, Frans, see America, Pierre, M-M Dec 90 12-15, 61-75 
Sisto, Riccardo, see Albertengo, Guido, M-M Oct 90 63-71 
Slater, Michael. Micro View—The view from 10,000 feet; M-M Feb 90 
96-95 

Slater, Michael. Micro View—Who needs faster processors?; M-M Apr 
90 96, 95 

Slater, Michael. Micro View—What is RISC?; M-M Jun 90 96, 95 
Slater, Michael. Micro View—Failings of the patent system; M-M Aug 
90 96, 95 

Slater, Michael. Micro View—Protecting computer architecture; M-M 
Oct 90 96, 95 
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Slater, Michael. Micro View—Phelps rules Intel breached contract with 
AMD; M-M Dec 90 96-95 

Stern, Richard H. Micro Law—Appropriate and inappropriate legal 
protection of user interfaces and screen displays—V: How different 
forms of copyright protection interact with policy; M-M Feb 90 79-84 
Stern, Richard H. Micro Law—Software patents; M-M Apr 90 8-11 
Stern, Richard H. Micro Law—Professional ethics and the law; M-M 
Jun 90 83-84 

Stern, Richard H. Micro Law—More on software patents; M-M Aug 90 
7-9 

Stern, Richard H. Micro Law—I: The Paperback case; M-M Oct 90 7-10 
Stern, Richard H. Micro Law—The Paperback case—II: A ‘nonliteral’ 
analysis; M-M Dec 90 39-41 

Stock, S. J. On the Edge—Low-cost CAD drawings; M-M Aug 90 77-78 
Suzuki, Nariko, see Kaneko, Hiroaki, M-M Apr 90 56-69 

T 

Treleaven, Philip, Guest Ed., see Omnes, Jean-Francois, Guest Ed., M-M 
Dec 90 8-10 

V 

van der Hoeven, A. J., A. A. J. de Lange, E. F. Deprettere, and P. M. 
Dewilde. A model for the high-level description and simulation of 
VLSI networks; M-M Aug 90 41-48 

Van der Pyl, Thierry, Guest Ed., see Omnes, Jean-Francois, Guest Ed., 
M-M Dec 90 8-10 

van Twist, Rob, see America, Pierre, M-M Dec 90 12-15, 61-75 

W 

Wabuka, Hiroshi, see Kaneko, Hiroaki, M-M Apr 90 56-69 
Wang, Paul, see Darley, Merrick, M-M Jun 90 36-47 
Warren, Carl. Micro Standards—A busy year ahead; M-M Feb 90 85-86 
Warren, Carl. On the Edge—Wire-to-wire interaction; M-M Feb 90 
87-88 

Warren, Carl. On the Edge—Realizing a transmission model; M-M Jun 
90 76-79 

Warren, Carl. Micro Standards—The scalable coherent interface; M-M 
Jun 90 80-82 

Warren, Carl. Micro Standards—Backplane measurements; M-M Aug 
90 75-77 

Warren, Carl. Micro Standards—Keeping up with Uncle Sam; M-M Oct 
90 72-73 

Warren, Carl. Micro Standards—A performance standard to consider; 
M-M Dec 90 42-45 

Watson, K. L., see Sibai, F. N., M-M Aug 90 21-33 
Wester, Rogier, see America, Pierre, M-M Dec 90 12-15, 61-75 
Whitby-Strevens, Colin. Transputers—Past, present and future; M-M 
Dec 90 16-19, 76-82 

Y 

Yamada, Hiroji, see Hatano, Yuji, M-M Apr 90 40-55 
Yang, Larry, see Darley, Merrick, M-M Jun 90 36-47 
Yano, Shinichiro, see Hatano, Yuji, M-M Apr 90 40-55 

SUBJECT INDEX 

A 

Analog-digital conversion 

merging Sigma-Delta A/D converters and DSPs for mixed-signal 
processors. Davis, Henry, + , M-M Oct 90 17-27 

Application-specific integrated circuits 

AMP (Analog Modeling Package), ASIC methodology for mixed 
analog-digital simulation. Rumsey, Michael, + , M-M Aug 90 34-40 

Arithmetic; cf. Floating-point arithmetic 
Artificial intelligence 

book review; The Fifth Generation Fallacy (Unger, J. M.; 1987). 
Mateosian, Richard, M-M Feb 90 7-8 

B 

Bandpass filters 

TMS320C25-based multirate filter. Chassaing, Rulph, + , M-M Oct 90 
54-62 


Bipolar integrated circuits; cf. Emitter-coupled logic 
Block coding; cf. Cyclic coding 
Book reviews 

Cache and Memory Hierarchy Design—A Performance-Directed 
Approach (Przybylski, S. A.; 1990). Mateosian, Richard, M-M Dec 
90 46-47 

CASE—Using Software Development Tools (Fisher, A. S.; 1988). 

Mateosian, Richard, M-M Apr 90 87 
Computer Architecture—A Quantitative Approach (Patterson, D. A., 
and Hennessy, J. L.; 1990). Mateosian, Richard, M-M Jun 90 5 
Mastering Technical Writing (Mancuso, J. C.; 1990). Mateosian, 
Richard, M-M Oct 90 76-77 

Structured Walkthroughs, 4th edn. (Yourdon, E.; 1989). Mateosian, 
Richard, M-M Apr 90 86-87 

The Elements of Spreadsheet Style (Nevison, J. M.; 1987). Mateosian, 
Richard, M-M Dec 90 46-47 

The Fifth Generation Fallacy (Unger, J. M.; 1987). Mateosian, 
Richard, M-M Feb 90 7-8 

The Matrix—Computer Networks and Conferencing Systems 
Worldwide (Quaterman, J. S.; 1990). Mateosian, Richard, M-M Apr 
90 SI 

Business; cf. International trade 

C 

Cache memories 

68040 processor memory subsystem, external bus, chip and board 
testing, and design verification. Edenfield, Robin W., +, M-M Jun 90 
22-35 

architecture of 88000 family of high-performance 32-bit 
microprocessors. Alsup, Mitch, M-M Jun 90 48-66 
book review; Cache and Memory Hierarchy Design—A 
Performance-Directed Approach (Przybylski, S. A.; 1990). 

Mateosian, Richard, M-M Dec 90 46-47 
cache DRAM, hierarchical RAM with 1-Mb DRAM main memory 
and 8-kb SRAM cache. Hidaka, Hideto, + , M-M Apr 90 14-25 
i486 CPU, 386-compatible processor with cache integrated into 
instruction pipeline. Crawford, John H., M-M Feb 90 27-36 
implementation of B5000 Sparc microprocessor in ECL. Brown, Emil 
W., + , M-M Feb 90 10-22 

the Gmicro/300 3d-bit microprocessor instruction execution, pipeline 
structure, and effect of internal caches. Kitahara, Takeshi, + , M-M 
Jun 90 68-75 

CAD (computer-aided design); cf. Design automation 
CASE; cf. Computer-aided software engineering 
Coding/decoding; cf. Cyclic coding 
Communication systems; cf. Teletext/videotex 
Compilers 

crafting compilers for RISC processors. Pennello, Thomas J., M-M 
Feb 90 37-43 

Computation time 

parallel unification machine for speeding up operation in execution of 
logic programs. Sibai, F. N., + , M-M Aug 90 21-33 
Computer-aided design; cf. Design automation 
Computer-aided software engineering 

book review; CASE—Using Software Development Tools (Fisher, A. 
S.; 1988). Mateosian, Richard, M-M Apr 90 87 
Computer architecture 

book review; Computer Architecture—A Quantitative Approach 
(Patterson, D. A., and Hennessy, J. L.; 1990). Mateosian, Richard, 
M-M Jun 90 5 

comments on ‘A comparison of RISC architectures’ by R. S. Piepho 
and W. S. Wu. Luu, J., M-M Apr 90 5 (Original paper, Aug 89 51-62) 
parallel unification machine for speeding up operation in execution of 
logic programs. Sibai, F. N., + , M-M Aug 90 21-33 
protecting computer architectures legally (Micro View). Slater, 
Michael, M-M Oct 90 96, 95 


34 IEEE Micro 






Computer buses; cf. Data buses 

Computer communication; cf. Computer networks 

Computer graphics 

GX graphics accelerator architecture. Priem, Curtis R., M-M Feb 90 
44-54 

Computer input/output; cf. Computer interfaces 
Computer instructions; cf. Microcomputer instructions 
Computer interfaces 

appropriate and inappropriate legal protection of user interfaces and 
screen displays, part V (Micro Law). Stem, Richard H., M-M Feb 90 
79-84 

Lotus Development Corp. vs. Paperback Software International; court 
decision in copyright suit (Micro Law). Stem, Richard H., M-M Dec 
90 39-41 

Lotus Development Corp. vs. Paperback Software International; issues 
and decision in copyright dispute (Micro Law). Stem, Richard H., 
M-M Oct 90 7-10 

personal-experience-based comments on legal protection of screen 
displays. Paterson, Tim, M-M Apr 90 5 

Computer interfaces; cf. Microcomputer interfaces 
Computer language processors; cf. Compilers 
Computer languages 

European Declarative System (EDS) database and languages. 
Hammer, Carsten, + , M-M Dec 90 20-23, 83-88 

parallel computers for advanced information processing; survey of 
ESPRIT Project 415. America, Pierre, + , M-M Dec 90 12-15,61-75 

Computer languages; cf. Hardware design languages 
Computer networks 

book review; The Matrix—Computer Networks and Conferencing 
Systems Worldwide (Quaterman, J. S.; 1990). Mateosian, Richard, 
M-M Apr 90%1 

Computer pipeline processing; cf. Parallel processing; Pipeline 
processing 
Computers 

need for faster processors (Micro View). Slater, Michael, M-M Apr 90 
96, 95 

Control systems; cf. Rail-transportation control systems 
Copyright protection; cf. Software protection 
Cyclic coding 

parallel encoding of cyclic redundant codes. Albertengo, Guido, + . 
M-M Oct 90 63-71 

D 

Data buses 

68040 processor memory subsystem, external bus, chip and board 
testing, and design verification. Edenfield, Robin W., +, M-M Jun 90 
22-35 

data-ordering issues for multiplexed buses. James, David V., M-M Jun 
90 9-21 

Futurebus+ protocol stack and profiles. Harrison, Beasley, M-M Jun 
90 2, 92-93 

operation of serial (wire) bus (On the Edge). Warren, Carl, M-M Feb 
90 87-88 

overview of IEEE draft standard PI 194.0/D2, Backplane Electrical 
Performance Measurement (Micro Standards). Warren, Carl, M-M 
Aug 90 75-77 

overview of proposed IEEE Standard PI596 for scalable coherent 
interface (Micro Standards). Warren, Carl, M-M Jun 90 80-82 

Data communication; cf. Computer networks 
Data-flow computing 

designing a custom DSP circuit using VHDL. Kumar, Krishna A., + , 
M-M Oct 90 46-53 

Data processing; cf. List processing 
Database systems 

European Declarative System (EDS) database and languages. 
Hammer, Carsten, + , M-M Dec 90 20-23, 83-88 

Delta modulation 

merging Sigma-Delta A/D converters and DSPs for mixed-signal 
processors. Davis, Henry, + , M-M Oct 90 17-27 


Design automation 

applicative state transition model for high-level description and 
simulation of VLSI networks, van der Hoeven, A. J., + , M-M Aug 
90 41-48 

Design automation; cf. Design automation software 
Design automation software 

AMP (Analog Modeling Package), ASIC methodology for mixed 
analog-digital simulation. Rumsey, Michael, + , M-M Aug 90 34-40 
Gabriel, design environment for DSPs. Bier, Jeffrey C., + , M-M Oct 
90 28-45 

realizing a transmission model in SPICE (On the Edge). Warren, Carl, 
M-M Jun 9016-19 

software for low-cost CAD drawings on low-cost PC (On the Edge). 
Stock, S. J., M-M Aug 90 77-78 

Digital filters 

TMS320C25-based multirate filter. Chassaing, Rulph, + , M-M Oct 
90 54-62 

Digital image processing; cf. Image processing 

Digital integrated circuits; cf. Integrated circuits; Memories; 

Very-large-scale integration 
Digital systems 

hierarchical discrete-even simulation on hypercube architectures; 
application to digital system simulation. Chamberlain, Roger D., + , 
M-M Aug 90 10-20 

Discrete Fourier transforms 

AMP (Analog Modeling Package), ASIC methodology for mixed 
analog-digital simulation. Rumsey, Michael, + , M-M Aug 90 34-40 

Discrete-time filters; cf. Digital filters 
Displays 

appropriate and inappropriate legal protection of user interfaces and 
screen displays, part V (Micro Law). Stem, Richard H., M-M Feb 90 
79-84 

Displays; cf. Computer graphics 

E 

East Germany 

effect of reunification on East German electronics industry (Micro 
World). Kirrmann, Hubert, M-M Dec 90 

ECL; cf. Emitter-coupled logic 
Electronics industry 

effect of reunification on East German electronics industry (Micro 
World). Kirrmann, Hubert, M-M Dec 90 
implications of events in Scotland’s Silicon Glen; interview with 
Robert Crawford. Miller, Christine, M-M Jun 90 1 

Emitter-coupled logic 

implementation of B5000 Sparc microprocessor in ECL. Brown, Emil 
W, + , M-M Feb 90 10-22 

Engineering profession 

professional ethics and the law (Micro Law). Stern, Richard H., M-M 
Jun 90 83-84 

Engineering writing; cf. Writing 
Ethics; cf. Legal factors 
Europe 

ESPRIT SPAN Project on integrating symbolic and numerical 
computing on parallel systems; overview and description of SPRINT 
and DICE architectures. Rounce, PA., + , M-M Dec 90 24-27, 88-97 
European Declarative System (EDS) database and languages. 

Hammer, Carsten, + . M-M Dec 90 20-23, 83-88 
implications of events in Scotland’s Silicon Glen; interview with 
Robert Crawford. Miller, Christine, M-M Jun 90 1 
parallel computers for advanced information processing; survey of 
ESPRIT Project 415. America, Pierre, + , M-M Dec 90 12-15,61-75 
parallel computing in Europe (special issue). M-M Dec 90 8-10 
PYGMALION; survey of ESPRIT 2 Project 2059 on neurocomputing. 

Angeniol, Bernard, M-M Dec 90 28-31, 99-102 
transputers review of European developments and applications. 
Whitby-Strevens, Colin, M-M Dec 90 16-19, 76-82 
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F 

Filters; cf. Bandpass filters; Digital filters; Sampled-data filters 
Firmware; cf. Microprogramming 
Floating-point arithmetic 

68040 processor design and implementation; operation of integer and 
floating-point units. Edenfield, Robin W., + . M-M Feb 90 66-78 

TMS390C602A floating-point coprocessor for Sparc systems. Dctrley, 
Merrick, + , M-M Jun 90 36-47 

WTL3170/3171 Sparc floating-point coprocessors; design and 
development. Birman, Mark, + , M-M Feb 90 55-64 

Fourier transforms; cf. Discrete Fourier transforms 

G 

Governmental activities/factors 

openness at standards meetings and US policy on sharing information 
(Micro Standards). Warren, Carl, M-M Oct 90 72-73 

Graphics; cf. Computer graphics 

H 

Hardware design languages 

applicative state transition model for high-level description and 
simulation of VLSI networks, van der Hoeven, A. J., + , M-M Aug 
90 41-48 

designing a custom DSP circuit using VHDL. Kumar, Krishna A., + , 
M-M Oct 90 46-53 

Hierarchical memories; cf. Memory hierarchies 
Hierarchical systems 

cache DRAM, hierarchical RAM with 1-Mb DRAM main memory 
and 8-kb SRAM cache. Hidaka, Hideto, + , M-M Apr 90 14-25 

hierarchical discrete-even simulation on hypercube architectures; 
application to digital system simulation. Chamberlain, Roger D., + , 
M-M Aug 90 10-20 
History 

history and overview of programmable digital signal processes 
(DSPs). Lee, Edward A., M-M Oct 90 14-16 

Home communication systems; cf. Teletext/videotex 

IEEE standards 

Futurebus+ protocol stack and profiles. Harrison, Beasley, M-M Jun 
90 2, 92-93 

overview of IEEE Draft Standard PI 194.0/D2, Backplane Electrical 
Performance Measurement (Micro Standards). Warren, Carl, M-M 
Aug 90 75-77 

overview of proposed IEEE Standard PI596 for scalable coherent 
interface (Micro Standards). Warren, Carl, M-M Jun 90 80-82 

Image processing 

PYGMALION; survey of ESPRIT 2 Project 2059 on neurocomputing. 
Angniol, Bernard , M-M Dec 90 28-31,99-102 
Information systems; cf. Database systems 
Integrated-circuit testing; cf. Microprocessor testing 
Integrated circuits; cf. Application-specific integrated circuits; 

Very-large-scale integration 
Integrated circuits industry; cf. Electronics industry 
International trade 

impact of Cocom (Coordinating Committee on Multilateral Export 
Controls) restrictions (Micro World). Kirrmann, Hubert, M-M Feb 
904 

J 

Japan 

book review; The Fifth Generation Fallacy (Unger, J. M.; 1987). 
Mateosian, Richard, M-M Feb 90 7-8 

innovative technology in the Far East (special issue). M-M Apr 90 
14-69 

report on visit to various university and nonuniversity research centers 
in Japan (Software Report). Kahaner, David K., M-M Aug 90 4-6 

Josephson device logic 

4-bit, 250-MIPS processor using Josephson technology. Hatano, Yuji, 
+ , M-M Apr 90 40-55 


L 

Languages; cf. Computer languages 

Laplace transforms 

AMP (Analog Modeling Package), ASIC methodology for mixed 
analog-digital simulation. Rumsey, Michael, + , M-M Aug 90 34-40 

Legal factors 

appropriate and inappropriate legal protection of user interfaces and 
screen displays, part V (Micro Law). Stem, Richard H., M-M Feb 90 
79-84 

court ruling that Intel breached contract with AMD (Micro View). 

Slater, Michael, M-M Dec 90 96-95 
personal-experience-based comments on legal protection of screen 
displays. Paterson, Tim, M-M Apr 90 5 
professional ethics and the law (Micro Law). Stern, Richard H., M-M 
Jun 90 83-84 

protecting computer architectures legally (Micro View). Slater, 
Michael, M-M Oct 90 96, 95 

Legal factors; cf. Patents; Software protection 

List processing 

ASLP, PC-based highly pipelined low-cost microprogrammable list 
processor with two memory modules. Lee, K. H., + , M-M Aug 90 
50-61 

Logic circuits; cf. Emitter-coupled logic; Josephson device logic 

Logic programming 

European Declarative System (EDS) database and languages. 

Hammer, Cars ten, + , M-M Dec 90 20-23, 83-88 
parallel unification machine for speeding up operation in execution of 
logic programs. Sibai, F. N„ + , M-M Aug 90 21-33 

M 

Magnetic logic devices; cf. Josephson device logic 

Meetings 

1989 (First) Annual Hot Chips Symposium, Part 1 (special issue). M-M 
Feb 90 10-78 

1989 (First) Annual Hot Chips Symposium, Part 2 (special issue). M-M 
Jun 90 9-66 

report on 1990 (2nd) Int. Workshop on Software Quality Improvement 
(Software Report). Kahaner, David K., M-M Dec 90 48-51 

Memories; cf. Cache memories; Random-access memories; Virtual 
memories 

Memory hierarchies 

book review; Cache and Memory Hierarchy Design—A 
Performance-Directed Approach (Przybylski, S. A.; 1990). 
Mateosian, Richard, M-M Dec 90 46-47 

Memory management 

68040 processor memory subsystem, external bus, chip and board 
testing, and design verification. Edenfield, Robin W„ +, M-M Jun 90 
22-35 

architecture of 88000 family of high-performance 32-bit 

microprocessors. Alsup, Mitch, M-M Jun 90 48-66 
overview of rationale, principles, and hardware support for 
microprocessor virtual memories. Milenkovic, Milan, M-M Apr 90 
70-85 

Microcomputer architecture 

architecture of 88000 family of high-performance 32-bit 

microprocessors. Alsup, Mitch, M-M Jun 90 48-66 

Microcomputer instructions 

68040 processor design and implementation; operation of integer and 
floating-point units. Edenfield, Robin W., + , M-M Feb 90 66-78 
architecture of 88000 family of high-performance 32-bit 

microprocessors. Alsup, Mitch, M-M Jun 90 48-66 
clarifying what is and is not RISC (Micro View). Slater, Michael, M-M 
Jun 90 96, 95 

comments on ‘A comparison of RISC architectures’ by R. S. Piepho 
and W. S. Wu. Luu, J., M-M Apr 90 5 (Original paper, Aug 89 51-62) 
crafting compilers for RISC processors. Pennello, Thomas J., M-M 
Feb 90 37-43 

i486 CPU, 386-compatible processor with cache integrated into 
instruction pipeline. Crawford, John H„ M-M Feb 90 27-36 
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the Gmicro/300 3d-bit microprocessor instruction execution, pipeline 
structure, and effect of internal caches. Kitahara, Takeshi, + , M-M 
Jun 90 68-75 

Microcomputer interfaces 

overview of proposed IEEE Standard P1596 for scalable coherent 
interface (Micro Standards). Warren, Carl, M-M Jun 90 80-82 

Microcomputer performance 

standard approach to system performance measurement (Micro 
Standards). Warren, Carl, M-M Dec 90 42-45 

Microcomputer software 

Arabic text-to-speech conversion on a personal computer. El-Imam, 
YousifA., + , M-M Aug 90 62-74 

Pax parallel computer family; overview of development and 
characteristics (Software Report). Kahaner, David K., M-M Oct 90 
5-6,91-93 

software for low-cost CAD drawings on low-cost PC (On the Edge). 
Stock, S. J., M-M Aug 90 11-IS 

Microcomputer software design/development 

crafting compilers for RISC processors. Pennello, Thomas J., M-M 
Feb 90 37-43 

eliminating software bottlenecks from chip design; case history of 
RISC development. Johnson, Stephen C. t M-M Feb 90 23-26 

Microcomputer testing; cf. Microprocessor testing 

Microprocessor testing 

68040 processor memory subsystem, external bus, chip and board 
testing, and design verification. Edenfield, Robin W., +, M-M Jun 90 
22-35 

Microprocessors 

1989 (First) Annual Hot Chips Symposium, part 1 (special issue). M-M 
Feb 90 10-78 

1989 (First) Annual Hot Chips Symposium, Part 2 (special issue). M-M 
Jun 90 9-66 

32-b V80 microprocessor; internal hardware structure, pipeline 
operation, and system support functions. Kaneko, Hiroaki, + , M-M 
Apr 90 56-69 

4-bit, 250-MIPS processor using Josephson technology. Hatano, Yuji, 
+ , M-M Apr 90 40-55 

68040 processor design and implementation; operation of integer and 
floating-point units. Edenfield, Robin W., + , M-M Feb 90 66-78 
68040 processor memory subsystem, external bus, chip and board 
testing, and design verification. Edenfield, Robin W., + , M-M Jun 90 
22-35 

architecture of 88000 family of high-performance 32-bit 
microprocessors. Alsup, Mitch, M-M Jun 90 48-66 
eliminating software bottlenecks from chip design; case history of 
RISC development. Johnson, Stephen C., M-M Feb 90 23-26 
i486 CPU, 386-compatible processor with cache integrated into 
instruction pipeline. Crawford, John H., M-M Feb 90 27-36 
implementation of B5000 Sparc microprocessor in ECL. Brown, Emil 
W, + , M-M Feb 90 10-22 

innovative technology in the Far East (special issue). M-M Apr 90 
14-69 

the Gmicro/300 3d-bit microprocessor instruction execution, pipeline 
structure, and effect of internal caches. Kitahara, Takeshi, + , M-M 
Jun 90 68-75 

TMS320C25-based multirate filter. Chassaing, Rulph, +, M-M Oct 90 
54-62 

TMS390C602A floating-point coprocessor for Sparc systems. Darley, 
Merrick, + , M-M Jun 90 36-47 

transputers review of European developments and applications. 

Whitby-Strevens, Colin, M-M Dec 90 16-19, 76-82 
where microprocessors have been in past decade and where they will 
go in next (Micro View). Slater, Michael, M-M Feb 90 96-95 
WTL3170/3171 Sparc floating-point coprocessors; design and 
development. Birman, Mark, + , M-M Feb 90 55-64 


Microprogramming 

ASLP, PC-based highly pipelined low-cost microprogrammable list 
processor with two memory modules. Lee, K. H., + , M-M Aug 90 
50-61 

Multilevel systems; cf. Hierarchical systems 

Multiplexing 

data-ordering issues for multiplexed buses. James, David V., M-M Jun 
90 9-21 

Multiprocessing 

ESPRIT SPAN Project on integrating symbolic and numerical 
computing on parallel systems; overview and description of SPRINT 
and DICE architectures. Rounce, P. A., +, M-M Dec 90 24-27, 88-97 
European Declarative System (EDS) database and languages. 

Hammer, Carsten, + , M-M Dec 90 20-23, 83-88 
hierarchical discrete-even simulation on hypercube architectures; 
application to digital system simulation. Chamberlain, Roger D., + , 
M-M Aug 90 10-20 

parallel computers for advanced information processing; survey of 
ESPRIT Project 415. America, Pierre, + , M-M Dec 90 12-15, 61-75 
parallel computing in Europe (special issue). M-M Dec 90 8-10 
parallel unification machine for speeding up operation in execution of 
logic programs. Sibai, F. N., + , M-M Aug 90 21-33 
Pax parallel computer family; overview of development and 
characteristics (Software Report). Kahaner, David K., M-M Oct 90 
5-6,91-93 

transputers review of European developments and applications. 
Whitby-Strevens, Colin, M-M Dec 90 16-19, 76-82 

N 

Networks; cf. Computer networks; Neural networks; Petri nets 

Neural networks 

PYGMALION; survey of ESPRIT 2 Project 2059 on neurocomputing. 
Angeniol, Bernard, M-M Dec 90 28-31, 99-102 

O 

Object-oriented programming 

parallel computers for advanced information processing; survey of 
ESPRIT Project 415. America, Pierre, + , M-M Dec 90 12-15, 61-75 

P 

Parallel processing 

encoding of cyclic redundant codes. Albertengo, Guido, + , M-M Oct 
90 63-71 

VLSI-based processing element design for ADENA high-performance 
parallel computer. Kaneko, Katsuyuki, + , M-M Apr 90 26-38 

Parallel processing; cf. Pipeline processing 

Patents 

benefits and problems of software patents (Micro Law). Stern, Richard 
H., M-M Apr 90 8-11 

examination of US Federal Circuit Court decision on software patent 
case involving EE issues (Micro Law). Stern, Richard H., M-M Aug 
901-9 

failings of US patent system (Micro View). Slater, Michael, M-M Aug 
90 96, 95 

Petri nets 

applicative state transition model for high-level description and 
simulation of VLSI networks, van der Hoeven, A. J., + , M-M Aug 
90 41-48 

Pipeline processing 

32-b V80 microprocessor; internal hardware structure, pipeline 
operation, and system support functions. Kaneko, Hiroaki, + , M-M 
Apr 90 56-69 

68040 processor design and implementation; operation of integer and 
floating-point units. Edenfield, Robin W., + , M-M Feb 90 66-78 
ASLP, PC-based highly pipelined low-cost microprogrammable list 
processor with two memory modules. Lee, K. H., + , M-M Aug 90 
50-61 

the Gmicro/300 3d-bit microprocessor instruction execution, pipeline 
structure, and effect of internal caches. Kitahara, Takeshi, + , M-M 
Jun 90 68-75 
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Frog ramming; cf. Software design/development 
Pulse-code modulation; cf. Delta modulation 

Q 

Quality control; cf. Software quality 
Quantization; cf. Analog-digital conversion 

R 

Rail transportation 

using microprocessors in trains and train buses (Micro World). 
Kirmumn, Hubert, M-M Oct 90 79-80 

Rail-transportation control systems 

train control automation in Europe (Micro World). Kirmumn, Hubert, 
M-M Aug 90 79-80 

RAM; cf. Random-access memories 
Random-access memories 

cache DRAM, hierarchical RAM with 1-Mb DRAM main memory 
and 8-kb SRAM cache. Hidaka, Hideto, +, M-M Apr 90 14-25 

RD&E 

report on visit to various university and nonuniversity research centers 
in Japan (Software Report). Kahaner, David K., M-M Aug 90 4-6 

S 

Sampled-data filters 

TMS320C25-based multirate filter. Chassaitig, Rulph, + , M-M Oct 90 
54-62 

Semiconductor electronics industry; cf. Electronics industry 
Signal processing 

designing a custom DSP circuit using VHDL. Kumar, Krishna A., + , 
M-M Oct 90 46-53 

Gabriel, design environment for DSPs. Bier, Jeffrey C., + , M-M Oct 
90 28-45 

history and overview of programmable digital signal processes 
(DSPs). Lee, Edward A., M-M Oct 90 14-16 
maturing of digital signal processing (special issue). M-M Oct 90 11 -62 
merging Sigma-Delta A/D converters and DSPs for mixed-signal 
processors. Davis, Henry, +, M-M Oct 90 17-27 
PYGMALION; survey of ESPRIT 2 Project 2059 on neurocomputing. 
Angeniol, Bernard, M-M Dec 90 28-31, 99-102 
Signal sampling/reconstruction; cf. Analog-digital conversion 
Simulation 

hierarchical discrete-even simulation on hypercube architectures; 
application to digital system simulation. Chamberlain, Roger D., + , 
M-M Aug 90 10-20 

Software; cf. Computer languages; Design automation software; 

Microcomputer software 
Software design/development 

book review; The Elements of Spreadsheet Style (Nevison, J. M.; 

1987). Mateosian, Richard, M-M Dec 90 
report on visit to various university and nonuniversity research centers 
in Japan (Software Report). Kahaner, David K., M-M Aug 904-6 
using transaction analysis to develop software for space station 
Freedom (On the Edge). Govers, Francis X., Ill, + , M-M Oct 90 
73-75 

Software design/development; cf. Microcomputer software design/ 
development; Software development environments 
Software development environments 

PYGMALION; survey of ESPRIT 2 Project 2059 on neurocomputing. 
Angeniol, Bernard, M-M Dec 90 28-31, 99-102 
Software development management 

book review; Structured Walkthroughs, 4th edn. (Yourdon, E.; 1989). 
Mateosian, Richard, M-M Apr 90 86-87 
Software protection 

benefits and problems of software patents (Micro Law). Stern, Richard 
H., M-M Apr 90 8-11 

examination of US Federal Circuit Court decision on software patent 
case involving EE issues (Micro Law). Stern, Richard H., M-M Aug 
901-9 

Lotus Development Corp. vs. Paperback Software International; court 
decision in copyright suit (Micro Law). Stem, Richard H., M-M Dec 
90 39-41 


Lotus Development Corp. vs. Paperback Software International; issues 
and decision in copyright dispute (Micro Law). Stem, Richard H., 
M-M Oct 90 7-10 

Software quality 

report on 1990 (2nd) Int. Workshop on Software Quality Improvement 
(Software Report). Kahaner, David K, M-M Dec 90 48-51 

Source coding; cf. Delta modulation 

Space stations 

using transaction analysis to develop software for space station 
Freedom (On the Edge). Govers, Francis X., Ill, + , M-M Oct 90 
73-75 

Special issues/sections 

1989 (First) Annual Hot Chips Symposium, Part 1 .M-M Feb 90 10-78 
1989 (First) Annual Hot Chips Symposium, Part 2. M-M Jun 90 9-66 
innovative technology in the Far East. M-M Apr 90 14-69 
maturing of digital signal processing. M-M Oct 90 11-62 
parallel computing in Europe. M-M Dec 90 8-10 

Speech processing 

PYGMALION; survey of ESPRIT 2 Project 2059 on neurocomputing. 
Angeniol, Bernard, M-M Dec 90 28-31, 99-102 

Speech synthesis 

Arabic text-to-speech conversion on a personal computer. El-Imam, 
Yousif A., + , M-M Aug 90 62-74 

Standards 

openness at standards meetings and US policy on sharing information 
(Micro Standards). Warren, Carl, M-M Oct 90 72-73 
system performance measurement; standard approach (Micro 
Standards). Warren, Carl, M-M Dec 90 42-45 

Standards; cf. IEEE standards 

T 

Teleconferencing 

book review; The Matrix—Computer Networks and Conferencing 
Systems Worldwide (Quaterman, J. S.; 1990). Mateosian, Richard, 
M-M Apr 90 87 

Teletext/videotex 

minitel, terminal for accessing French Teletel videotex service (Micro 
World). Kirrmann, Hubert, M-M Apr 90 88-90 

Text processing 

Arabic text-to-speech conversion on a personal computer. El-Imam, 
Yousif A., + , M-M Aug 90 62-74 

Trade; cf. International trade 

Transforms; cf. Discrete Fourier transforms; Laplace transforms 

Transmission lines 

realizing a transmission model in SPICE (On the Edge). Warren, Carl, 
M-M Jun 90 76-79 


V 

Very-large-scale integration 

applicative state transition model for high-level description and 
simulation of VLSI networks, van der Hoeven, A. J., + , M-M Aug 
90 41-48 

VLSI-based processing element design for ADENA high-performance 
parallel computer. Kaneko, Katsuyuki, + , M-M Apr 90 26-38 

Videotex; cf. Teletext/videotex 

Virtual memories 

overview of rationale, principles, and hardware support for 
microprocessor virtual memories. Milenkovic, Milan, M-M Apr 90 
70-85 

VLSI; cf. Very-large-scale integration 


book review; Mastering Technical Writing (Mancuso, J. C.; 1990). 
Mateosian, Richard, M-M Oct 9016-11 


U 

United States; cf. Governmental activities/factors 
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Law 


Richard H. Stem 

Law Office of 
Richard H. Stern 
1300 19th Street NW, 
Suite 400 

Washington, DC 20036 


Th© Paperback CcIS© Part 2, A “nonliteral” analysis 


Q he court’s opinion in Lotus Development 
Corp. v. Paperback Software Interna¬ 
tional 1 did not approach the case in terms 
of statutorily listed, or even previously known, 
copyrightable subject matter. Instead, the court 
decided to devise a new species of copyrightable 
subject matter. It held that the “nonliteral” as¬ 
pect of the Lotus 1-2-3 user interface that was 
embodied or reflected in its “command struc¬ 
ture” was copyrighted. 

The court defined the protected command 
structure as “the menu structure, taken as a 
whole—including the choice of command terms, 
the structure and order of those terms, their pre¬ 
sentation on the screen, and the long prompts.” 

(The choice of command words means the 
collection of commands in the menu bars of 
1-2-3. The structure and order of those terms 
means the tree relationship. That is the “tree” 
depicted in Figure 2 of Part 1. The presenta¬ 
tion on the screen probably means the menu 
bars, although the court did not spell that 


point out. The so-called long prompts are short 
explanatory messages that sometimes appear 
on the second line of a menu bar instead of 
a further list of commands. For example, un¬ 
der “Global,” the explanation “Set worksheet 
settings” appears. The content of these prompts 
is minimal. It reflects both the simplicity of 
the message and the relatively few ways pos¬ 
sible to state it. For all practical purposes, 
the gist of what the court decided to protect 
is the command set and the command tree.) 

In the court’s view, the nonliteral aspects of 
a computer program should be protected by 
copyright law. After all, the plot, characteriza¬ 
tion, and details of dialogue of a novel or play 
would be protected against nonverbatim copy¬ 
ing. Why should a computer program be 
different? Basically, the court felt the nonliteral 
aspects of 1-2-3, specifically the user interface 
and command structure, were the most valuable 
part of the work as a whole. A major problem 
with this approach is that it is so unpredictable 


Highlights 

Part 1 of this series described the most recent copyright decision in the screen display/user 
interface field, Lotus Development Corp. v. Paperback Software International. The decision cre¬ 
ated enormous concern in the software industry, because the court went out of its way to expound 
a broad theory that copyright law protects an unspecified collection of “nonliteral” aspects of 
computer programs. 

Part 1 discussed the 1-2-3 computer program, command tree, and menu bars. It also explored 
the reason the defendant copied the command tree. The defendant claimed that the 1-2-3 input- 
command user interface had become an established convention in the spreadsheet field, a de 
facto standard. 

The court chose not to analyze the issues in temis of the menu bars as conventional subject 
matter of copyright, such as pictorial works. It saw the case in a completely different light, to 
which Part 2 now turns. 
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in scope. It is without any coherent 
rationalizing principle, and totally de¬ 
structive of business certainty. 

The court’s discussion, and the 
defendant’s conduct challenged in the 
lawsuit, indicate the most important 
aspect of user interface and command 
structure protected by the Paperback 
decision. That aspect is the tree struc¬ 
ture of the commands. Identically du¬ 
plicating the tree was key to the 
defendant’s attempt to achieve total 
compatibility. With such duplication, 
users could transfer their training and 
their macro libraries at minimal trans¬ 
action cost. 

Protecting the tree moves copyright 
protection of user interfaces to a pre¬ 
viously unreached level of abstraction. 
All of the prior user-interface decisions 
appear to have decided whether there 
was copyright infringement on the ba¬ 
sis of whether screen displays looked 
alike. (See the Prior Decisions box.) 
While that decision might have been 
reached in the Paperback decision, it 
was not. Instead, the court zeroed in 
on the reason why the parties’ menu 
bars looked alike. The court protected 
the plaintiffs interest reflected in that 
reason as a nonliteral aspect of the 
underlying computer program. The 
reason, of course, was that both 
spreadsheet programs were being 
operated in accordance with the same 
input-command tree. 

The direct/indirect issue 

A question was swept aside by the 
Paperback court’s nonliteralist analy¬ 
sis that would or should have been 
suggested by a more literal approach. 
That question is whether it is appro¬ 
priate to protect, by protecting some¬ 
thing conventionally protected by 
copyright law, something else that does 
not itself fit within that or any other 
known category of things protected by 
copyright. Here, one might ask whether 
protecting input-command structure by 
protecting the menu bars as pictures is 
appropriate. Pictures are generally 
protectable, as such. But the plan is 
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Prior decisions 

No prior decision has expressly 
protected a command set and the 
structural, tree relationship of 
commands, as such. But a few 
prior decisions have some oblique 
intimations in this direction. 

The concept of “structure” as a 
nonliteral, noncode element of a 
computer program derives from 
the decision in Whelan Associ¬ 
ates, Inc. v. Jaslow Dental Labo¬ 
ratory, Inc. 2 In that case, the court 
held that the defendant infringed 
the copyright in the plaintiff’s 
computer program, even though 
the codes were different. Indeed, 
the computer programs were 
written in dissimilar programming 
languages. The court based its 
conclusion of substantial similar¬ 
ity primarily on “structural” simi¬ 
larities of five subroutines and in 
the choice of data fields within 
certain types of files. 

In Manufacturers Technologies, 
Inc. v. Cams, Inc f the court pro¬ 
tected the “sequence and flow” of 
a set of menus as included within 
the copyright protection given to 
the computer program. The se¬ 
quence and flow of the menus is 
a tree relationship similar to the 
tree relationship of the menu bars 
of the 1-2-3 program. 

In Digital Communications 
Assoc., Inc. v. Softklone Distrib. 
Corp., 4 the court found the main 
menu of the Mirror computer 
program to infringe the copyright 
in the main menu of Crosstalk. To 
some extent, although the court 
did not make a point of it. the 
court based its finding of substan¬ 
tial similarity on the fact that the 
same commands and parameters 
were listed in both menus under 
the same names and keystrokes. 
See IEEE Micro, Aug. 1989, p. 9- 


really to protect the idea of these pic¬ 
tures—the command tree that they de¬ 
pict. Is it inappropriate to protect 
input-command structure indirectly 
when it cannot, under a literal ap¬ 
proach, be protected directly? There is 
a division of authority on this point, 
with perhaps somewhat stronger 
support for the view opposed to pro¬ 
tecting indirectly that which is un- 
protectable directly. 

The US Supreme Court’s decision 
in Baker v. Selden 5 suggests one view. 
The Supreme Court held that one 
cannot protect bookkeeping forms that 
are a “necessary incident” of an art 
taught in a book on bookkeeping. The 
court reasoned that the art of book¬ 
keeping is not protectable as such by 
copyright. Thus, it would be improper 
to protect the art indirectly by pro¬ 
tecting directly the forms needed to 
practice the art. Nowadays, courts refer 
to this reasoning as the merger of idea 
and expression. 

On the other hand, courts have 
not protected houses by copyright, 
although copyright law protects 
drawings (including those of houses) 
against reproduction. Yet, as a practical 
matter one cannot efficiently copy a 
house without copying the blueprints 
for the house. Courts also hold that 
copying such blueprints is copyright 
infringement. 6 Thus, copyright law 
may indirectly protect the house against 
copying by protecting the blueprints. 

Under the Paperback approach, 
considering the question whether it is 
inappropriate to protect the command 
tree indirectly is unnecessary. The court 
simply protects it directly by defining 
it as a nonliteral aspect of the copy¬ 
righted computer program. That is one 
way to avoid a troublesome problem. 

Functionality 

Several aspects of the Paperback 
court’s treatment of possibly functional 
aspects of the 1-2-3 command tree de¬ 
serve mention. Unquestioned copyright 
doctrine holds that copyright law does 
not protect functional aspects of com- 










puter programs or other works. How¬ 
ever, often disagreement arises over 
what aspects of a computer program 
are functional. 

In the Micro Law series on screen 
displays and other user interfaces, 7 I 
suggested a solution to the disagree¬ 
ment. A proper concept of functional¬ 
ity includes whatever makes a 
computer program easier and faster to 
learn and use, leads to fewer user er¬ 
rors, or is otherwise better or cheaper 
than alternatives. This concept included 
conventions (such as using <F1> to in¬ 
voke Help menus). It also included 
standards, whether de facto (such as 
the QWERTY keyboard) or de jure 
(either blessed by the IEEE or ANSI or 
commanded by a governmental body). 

In the Paperback decision, the court 
emphatically and comprehensively re¬ 
jected any aspect of convention, de 
facto standard, or commercial need for 
compatibility as a justification for use 
of the command tree. First, the court 
denied that compatibility with 1-2-3 was 
commercially necessary. To show it 
was not, the court pointed to the exist¬ 
ence of other spreadsheets on the 
market that are not compatible with 
1-2-3. Indeed some of them have had 
superior functionality, performing tasks 
that 1-2-3 did not. But the court did 
not address the market position of the 
competitive products. 

It appears that no competitor has 
been able to gain a significant market 
share, and that 1-2-3 remains dominant. 
That fact, unremarked by the court, 
would seem to undercut the material¬ 
ity of the mere existence of competi¬ 
tion. (If incompatible spreadsheets are 
only marginally viable, it would seem 
highly material to determining whether 
a situation like that of the QWERTY 
keyboard exists.) At the very least, the 
matter should have been addressed. 
Instead, the court made a conclusory 
assertion that compatibility is com¬ 
mercially unnecessary. 

Second, the court said that the de¬ 
fendant should have dealt with the 
compatibility problem, if one existed. 


The defendant, the court said, should 
have written Help menus to explain to 
users how to substitute different key¬ 
strokes for those of 1-2-3. Also, the 
defendant should have written a com¬ 
puter program to convert customers’ 
macros from the existing 1-2-3 com¬ 
mand format to another, new format. 

The first part of this suggestion just 
ignores commercial practicability. To 
the extent that simple 1:1 keystroke 
substitutions are involved, the sugges¬ 
tion is comparable to giving a typist a 
Dvorak keyboard in place of the 
QWERTY keyboard, along with a chart 
showing the substitutions. To the ex¬ 
tent that the command tree or the ba¬ 
sic logical approach is to be changed, 
the suggestion is like sending the av¬ 
erage user to Bulgaria with nothing but 
a pocket English-Bulgarian phrase 
book and the court’s good wishes. 

As for the macro translator, the court 
said that it was a commercially practi¬ 
cable solution because another 
spreadsheet vendor had adopted it. 
Again, the court failed to address the 
acceptance of this expedient in the 
marketplace, as measured by market 
share. If, as appears to be the case, the 
market share of the company using the 
macro translator is very small compared 
to 1-2-3’s share, this is probably not a 
commercially practicable proposal. 

The court’s proposals are remarkably 
unhelpful. You can think of it this way. 
Say the defendant wanted to market 
an English-language computer pro¬ 
gram. Then, it claimed that it had to 
put the subject of a statement or sen¬ 
tence first, then the verb, and then the 
object, as in “The dog bit the boy.” 
Whereupon the defendant might re¬ 
ceive this response: 

You may as well say, “The dog 
boy bit,” and just assume that 
you are speaking German. Your 
customers will probably get used 
to it, eventually. And you can 
give them a Help chart explain¬ 
ing how to speak in German in¬ 
stead of in English. 


Or, as long as you are pro¬ 
viding a Help chart, why don’t 
you just add case endings and 
declensions and that sort of 
thing? Then you can scramble 
the order randomly and tell the 
customers to assume that they 
are speaking in Latin. 

And while you are at it, don’t 
feel limited to Indo-European 
languages. Chinese has some 
perfectly fascinating possibilities 
for icons and ideograms. Don’t 
worry, your customers may well 
adjust to it. 

In the next issue, I will continue this 
analysis with a short discussion of the 
standardization aspects of functional¬ 
ity. Afterward, I will comment about 
what is going on in the Paperback case, 
and more generally in software cases. 
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ystem designers face a significant prob¬ 
lem: What is the best way to measure 
system performance? Clearly, a few 
simple benchmarks aren’t the answer. We need 
to develop a standard way of measuring system 
performance. I propose an approach that I be¬ 
lieve is workable. I hope you will add your own 
ideas; possibly a formal standards working group 
will form. 

Some of the ideas presented here come from 
the concept of rate monotonic scheduling—a 
statistical method of determining CPU usage. 
You may want to consult the work proposed 
by Jerome C. Huck and Michael J. Flynn in an 
excellent IEEE Computer Society book, Ana¬ 
lyzing Computer Architectures ($42 nonmembers, 
$31.50 members). 1 This book provides a 
foundation for developing a system-measure¬ 
ment method. I use the authors’ Canonic In¬ 
terpretive (Cl) method as a basis for measuring 
the effects of a high-level language (see the 
Proposed Terms box). In this case the lan¬ 
guage is Ada, developed by the US Depart¬ 
ment of Defense. 

The Cl method of measurement assesses the 
size of the smallest representation of a high- 
level-language program in relationship to the 
system resources (CPU, memory). Cl also helps 
to determine how much work must be done 
by the system to make use of the program 
code. 

For example, let’s consider a system that 
serves as a multitasking unit. Say that power, 
budget, and environment considerations con¬ 
strain the memory. The system operates with 
all CSCI (Computer Software Configuration Item) 2 
components in memory at one time (the sizes 
are questionable). Its equivalency factor is based 


on KIPS, or thousands of instructions per sec¬ 
ond, a term specific to small-scale processors. 

Using the KIPS of a system related to source 
lines of code is probably an invalid approach 
when applied to most processor environments. 
It doesn’t account for compiler capability or 
compactness. The KIPS formula says that for 
every source line of code, n machine instruc¬ 
tions execute. This concept would be inter¬ 
esting if it were true—but it isn’t. Further, the 
KIPS approach describes an interpretative 
method. In other words, each line of code must 
be interpreted, from high-level Ada to machine 
code. This is simply not the case. 

Our example also assumes that the CPU power 
is directly equitable to a MIPS (million instruc¬ 
tions per second) rating. 

Although a MIPS rating can give you a rough 
idea of how much CPU horsepower is avail¬ 
able, it doesn’t provide you with a true picture 
of the system. To determine this, you should 
evaluate the appropriateness of the machine 
in relationship to your subsystem requirements. 
Process speed, throughput, and availability of 
memory are all important. You should also 
establish a deterministic model that is useful 
in all cases for all processors. It’s also impor¬ 
tant to limit the application control in relationship 
to container (memory) size. 

The KIPS approach also fails to account for 
several performance factors (especially in terms 
of Intel Corp.’s 386 microprocessor). These 
include source-code compilation, processing 
and memory architecture, hardware/software 
considerations; and operating system processing. 

The 386 compiles source code (in the best 
cases by an optimizing compiler). The source 

continued on p. 44 
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Proposed terms 


Activity refers to the total number of Cl objects accessed. 
Try to keep this number small; remember small models 
work better than large ones. 

Block size is the amount of memory needed to store 
a defined environment. I use 1 Mbyte, although this 
amount can be reduced for a transfer function that 
involves 1-Kbyte blocks, for instance. 

Block work is a measure of the CPU time needed to 
process a block of data. 

The Canonic Interpretive (Cl) method statistically 
measures a process and actually forms the basis for 
monotonic rate scheduling. Cl operations roughly equate 
to instructions. Cl forms a lower bound so you can 
assume an “ideal” boundary, and it has no knowledge 
of frequency distribution of program constructs, static 
or dynamic. Therefore, this measure considers any compiler 
optimization to be complete and assumes the source 
to be in its best form. (“Best form” means that care 
was taken in creating well-crafted code.) 

A container describes the boundaries of the envi¬ 
ronments. In this case, a container size may be 4 Mbytes. 
This container is defined as either a 16-bit or a 32-bit 
word, one of the elements that the standard should 
detemiine. The word size detennines how I/O is measured. 

Control transfer points label a point at which the 
procedure transfers control, data, or function to an¬ 
other procedure (forward or backward). 

The correspondent assumes a one-to-one relationship 
with executed instructions and operands. Therefore, 
the number of Cl operations (instructions) executed 
equals the dynamic number of high-level-language ac¬ 
tions in the source program. Thus, a procedure made 
up of Fors and Tos can be treated as one high-level 
action. 

CPU intensity refers to block size and block work. 
The goal is to approach unity, which would denote 
one processor cycle per word. The best you can pres¬ 
ently achieve is about five cycles per word. 

CPU utilization concerns a percentage measurement of 
the CPU time/CPU time + I/O time. 

Distance is a measurement of the space between ob¬ 
jects in transition times. This measurement is only impor¬ 
tant when a large number of similar transition points exist. 
You may even use this for “virtual” objects. So {An} may be 
n-transition (machine-cycle) times aw r ay from {Bn}. 

Environment refers to the bounds of the program re¬ 
gion (you can call this a procedure, or a CSCI). 


A Gibson Mix is the totality of typical, executed instruc¬ 
tions of a processor instaiction set. For example, while 
the IBM 370 has 378 instruction types, only 78 are used. 

/(elapsed time) is the average time necessary to transfer 
1 Mbyte of data via the I/O system. (You need to know the 
characteristics of the I/O.) 

K (processing time) is the CPU time needed to process a 
1-Mbyte block. 

The statistical Kendall ranking 3 method doesn’t assign 
an artificial index. It ranks positions of procedures to other 
elements in a system, as opposed to a Spearman ranking, 
which is monotonic. To determine the right Kendall rank¬ 
ing, you analyze the collection of procedures and give them 
a name of your choice (for example, 1 or 2). The method 
lets you choose the base and bounds. This ranking is by 
nature bounded, or it has no meaning. 

A MIMD (multiple-instruction/multiple-data) stream de¬ 
fines the way a processor fetches instructions and data 
from logical memory. Thus a processor fetches or pipe¬ 
lines more than one instruction and associated data into its 
registers. The 386 operates this way to keep the fetch queues 
full and maximize processing time. A 20-MHz 386 takes 10 
machine cycles to perform a memory-to-register move in¬ 
struction. This common test is one of the more prevalent 
activities that a processor perfomis. 

The objects notion can be confusing because it has sev¬ 
eral meanings, depending on w r hat you are doing. For our 
purposes, it means operations, operands, and control/ 
transfer points. 

Procedure has a complex definition. A procedure con¬ 
sists of code elements with a beginning, a middle, and an 
end. A procedure can be one statement. 

A SIMD (single-instmction/multiple-data) stream char¬ 
acterizes typical von Neumann processors. 

Size refers to the extent of each object under consider¬ 
ation—in bits. The system equation is Log 2 (number of 
similar objects in the environment). 

Stability refers to the notion that the number of environ¬ 
ment and control transfer points are kept to a minimum. In 
other words, you should tiy to find the point of stability at 
which you get the most meaningful information. 

A value machine is an idealized model of the physical 
machine. No perturbations are allowed that may upset the 
operation of the machine, at least for the analysis. 

Work load refers precisely to what the system can with¬ 
stand in terms of memory usage, I/O, and performance. 
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code elements match the instruction 
set of the processor. Therefore, just a 
few instructions can represent several 
lines of code. For example, the pro¬ 
cessor treats active standing verbs, such 
as Begin and End, and library func¬ 
tions once—not multiple times. The 
Data Analysis Center at Wright- 
Patterson Air Force Base denotes a 
typical translation as 1.8 to 2.4 in¬ 
structions per line of Ada source code, 
based on many optimized compila¬ 
tions. 

The 386 processor and its memory 
architecture are robust. The 386 con¬ 
trols the memory and its segmenta¬ 
tion. Further, the 386 allows you to 
lock out (protect) segments of memory 
based on boundaries. In addition, the 
386 provides multiple hardware stacks 
and supports software stacks that are 
independent of the hardware. 

The processing environment defines 
itself based on the operating system. 
The OS scheduler, working in tandem 
with the 386 processor, determines 
what code is resident at any time. 

Another 386 factor is its internal 
Harvard architecture, which can fetch 
data and instructions in parallel. This 
fact makes a big difference in decid¬ 
ing how to create the proper model. 

Definitions 

I proposed the earlier set of terms 
to serve as part of this standard. I took 
some of them from Huck and Flynn; 
others are my own. (I don’t discuss 
all the terms listed, making the as¬ 
sumption that readers have a funda¬ 
mental knowledge of system analysis.) 

Analysis 

The whole purpose of analyzing a 
system is to measure the work load 
in a value machine. This process de¬ 
termines the necessary cache and 
memory requirements, and establishes 
some notion of the I/O parametrics. 

Cl establishes the number of op¬ 
erations, operands, and labels neces¬ 
sary to create a program or CSCI. To 
make this analysis work, we need to 
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determine Cl. Functions of operations, 
operands, and labels appear as syl¬ 
lables. The syllable is an important 
determination and therefore must be 
uniform. Using KIPS based on source 
lines of code is only accurate for an 
interpretive environment. Here, well 
ignore the frequency and introduce 
some number we can establish for 
conversion. 

For our purposes, I chose the 
number ranges from 1.8 to 2.4 pro¬ 
vided by the Data Analysis Center 
rather than the number 7 as selected 
in Huck and Flynn. You can always 
increase the number if you need to 
consider swag in the conversion of 
source code to executable objects. 

See the table for a system profile 
based on Cl measuring methods. A 
Kendall ranking 3 can replace the table, 
depending on how you want to real¬ 
ize the data. In the table, Exe refers 
to the execution time in either sec¬ 
onds, microseconds, nanoseconds, or 
machine cycles. You can choose the 
parameter, but you have to be con¬ 
sistent. Ops refers to the total num¬ 
ber of operations needed to effect the 
procedure element. 

The size and activity of the model 
need to be spread apart. For example, 
the table shows such operational 
characteristics as operands and labels. 
When you set up the characteristics 
as defined by the table, you define 
the overall system architecture. 

In establishing the uniformity of the 
model, you take into account that each 
compiler has unique ratios between 


source-code operations (and operands) 
and the number of processor instruc¬ 
tions. For optimized compilations this 
number ranges from 1.8 to 2.4. Any 
high-level verb takes at least 1.8 
instructions. 

Using these numbers as a basis, I’ve 
constructed the notion of a mean term, 
or an MOP (mean operating profile), 
to more accurately picture compiler- 
to-instruction-set interaction. 

Assumptions 

In any model you have to make 
some assumptions. I made the fol¬ 
lowing. The system uses a 20-MHz 
386. The system latency equals c (the 
speed of light, which can be ignored). 
We are operating in a domain so slow 
that any latency caused by bus trans¬ 
fers is minimal. Each instruction takes 
10 machine cycles. The SDP (standard 
data processor) Ada code is highly 
optimized and uses firmware and OS 
calls. Only some portions of any en¬ 
vironment are resident in memory at 
any one time, based on OS schedul¬ 
ing and the definition of the process¬ 
ing environment. 

So that the environment can be 
properly represented, I’ve modified 
Huck and Flynn’s approach to deter¬ 
mining the extent of the environment: 

(Log 2 MOP) x Aj + (log, AJ x 

B l (Log 2 B 2 ) X C x 

in which A, B, and C represent the 
subordinate procedures that make up 
the environment of a CSCI. This for- 


System profile based on Cl measurements. 

Procedure 

Exe 

Ops 

Operands 

Labels 

Branches 

Taken Entered 

Test for A 

100 

— 

— 

— 

— 

33 

If stat = A 

100 

1 

3 

— 

20 

— 

Do no test 

100 

1 

3 

1 

— 

— 

Else test 

100 

5 

2 

1 

1 

— 












mula depicts the total size of the en¬ 
vironment. Note that the MOP differs 
for each environment. In one proces¬ 
sor, some functions can represent 
several environments. A complex re¬ 
lationship exists between each envi¬ 
ronment and the state of the 
environment(s) at any given point in 
time. 

The dynamic instruction count is a 
primary measure of the architecture. 
This count tells us how efficiently the 
system executes a program (no mat¬ 
ter how badly the program is written). 
Because we have no page faults to 
contend with, the context remains 
constant (contrary to Motorola’s pro¬ 
cessors). The function is a copy-in, 
copy-out process. 

The static program size has a sec¬ 
ondary effect on the architecture and 
optimization. However, this size rep¬ 
resents the impact on system memory. 
This designation doesn’t imply that a 
2-Mbyte program uses 2 Mbytes of 
memory at one time. Rather, it estab¬ 
lishes the upper boundary. 

Since much of the work performed 
in any given processor/memory sys¬ 
tem involves fetching data objects, 
heavy processing tasks require only 
moderate concern. Exceptions do ex¬ 
ist, of course. The notion is to pre¬ 
serve as much processing resource 
(memory and I/O bandwidth) as 
possible. 

Optimum values 

To define the optimum values of 
instructions to executable functions, 
you must determine the Gibson Mix. 4 
You do this by analyzing the efficiency 
of the compiler and the overall soft¬ 
ware architecture. The OS can perturb 
the Gibson Mix by forcing double ex¬ 
ecution of certain instruction types. For 
example, improper interaction can 
cause multiple memory-to-memory 
moves. 

You can use such tools as J, K, and 
CPU utilization to model the system 
for perspective on the type of user 
operation you seek: 


Elapsed time = (/ + K ) (CPU 
intensity) 

Page rate = 2 (number of pages)/ J 

Elapsed time is measured in seconds 
per Mbyte. You must define the page 
(a 386 typically has a 4-Kbyte page). 
You can, if desired, tabulate this data 
for later use to create a graphics repre¬ 
sentation. I recommend that you use a 
convenient tool such as an Excel 
spreadsheet to enter the data and per¬ 
form the calculations. 

Now you have several ways to ana¬ 
lyze the modeling process. The next 
task is to plot the data from this table: 
elapsed time versus CPU intensity and 
CPU utilization versus CPU intensity. 

You can consider one final model 
that measures the CPU utilization fac¬ 
tor. Kenniston W. Lord, Jr., suggested 
the following method in the CDP Re¬ 
view Manual, A Data Processing 
Handbook? 

CPUjdjg = EQ * scary,, * / * 

1,000/ interval 

CPU** = EG* scary + EC 5 /EC 2 * 
scary,,, * I* 1,000/ interval 

CPU utl = 100 - (CPUjdje + CPU**) 

Here, EC , is the number of times no 
dispatches exist (idle count), EC 2 is the 
busy count (a lot of dispatches), and 
EC 3 is the task waiting for an event. 
Scan idle represents the number of in¬ 
structions executed at idle, scary,, is the 
number of instructions executed to 
determine if a task is waiting, and / 
equals the instruction-execution time 
in scan idle . This time can have many 
values. For our work, we assume av¬ 
erage instruction time. 

The bottom line 

The bottom line is that you can 
model the system. You should come 
out with a utilization-factor graph that 
will give you an idea of how best to 
use a processor/memory system. 

The overall notion here is to pro¬ 


vide a foundation for a utilization stan¬ 
dard. Much of this work is still in de¬ 
velopment. The Huck and Flynn book 
forms a solid foundation. If you have 
an interest in this type of performance 
measurement, please drop me a note. 
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Desk planners plus 


The 1991 Computer Desk Reference and 
Appointment Calendar (Microsoft Press, 
Redmond, Wash., 224 pp., $19-95) 

Every year I structure my life around some 
sort of appointment book. If you do this too, 
you may find this calendar interesting. 

The book measures approximately 8 inches 
by 10 inches and provides a two-page spread 
for each week of the year. Monday through 
Friday appear in columns of their own. Each 
column holds lines labeled 7 through 6 plus 
LUNCH and EVE. Then there are sections for 
KEY TASKS, PHONE OR WRITE, and EXPENSES. 
Along the bottom of the two-page spread are 
calendars for 12 months, centered on the current 
month. Saturday and Sunday and the quota¬ 
tion of the week share a column. 

Many of the quotations are specific to the 
computer field. One of my favorites is “Ad¬ 
vertising cannot overcome major marketing 
flaws in your program.’'—Andrew Donchak. 
Another is “Good design keeps the user happy, 
the manufacturer in the black and the aesthete 
unoffended.”—Raymond Loewy. 

The book provides the usual places for 
personal data and for names and addresses. 
It also provides a quarterly travel planner. 
Each calendar quarter uses one page, and many 
computer industry trade shows are shaded 
in. The book also allows two-page spreads 
for each of 1991 and 1992. In these, it allocates 
one small rectangle for each day. 

The reference material included in the book 
is quite useful. I especially like the minitravel 
guides provided for Atlanta, Boston, Chicago, 
Dallas, Denver, Houston, Los Angeles, New 


Orleans, New York, San Francisco, Seattle, 
and Washington. Each is a two-page spread 
with an area map on one page. The other 
page contains a downtown map, some gen¬ 
eral information, and a few hotels and res¬ 
taurants. I have no idea how Microsoft selected 
the hotels and restaurants. 

The book contains many other reference 
pages related to travel. It also contains a two- 
page summary of Dataquest statistics and many 
pages of lists of companies, organizations, 
and trade shows. It also offers a feeble tech¬ 
nical reference section, which contains Intel 
instruction sets and industry-standard char¬ 
acter sets. 

I might not have designed this book exactly 
the way Microsoft did, but I think it is well 
designed, and I think many people in the 
computer industry will find it useful. And if 
you’re looking for a present for the computer 
professional who has everything, this might 
just fill the bill. 

Cache and Memory Hierarchy Design — 
A Performance-Directed Approach , Steven 
A. Przybylski (Morgan Kaufmann, San Mateo, 
Calif., 1990, 223 pp., $33-95) 

This is a reworking of the author’s 1988 
thesis, which he wrote at Stanford under the 
tutelage of John Hennessy and Mark Horowitz. 
Not surprisingly, Przybylski performs the same 
kind of quantitative analysis that Hennessy 
and Patterson advocate in their recent book 
on computer architecture (see Micro Review, 
June 1990). 
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Przybylski measures cache design 
with the metric of program execution 
time. He contrasts this with the time- 
independent metric of cache miss rate. 
Execution time is more useful than 
miss rate but requires an elaborate 
analysis, which Przybylski provides. 

Przybylski views cache design from 
a system perspective. He recognizes 
that when CPU and cache are de¬ 
signed together, a more nearly opti¬ 
mal solution can emerge. Of course, 
the user of a commercial micropro¬ 
cessor must take the CPU as given 
and design a cache around it, but 
Przybylski’s analysis still applies in 
the most interesting cases. 

Przybylski first analyzes the problem 
for a single-level cache, then shows 
how to determine the optimal mul¬ 
tilevel memory hierarchy. Of course, 
he does not present a formula or even 
an algorithm to determine the opti¬ 
mum cache parameters for a given 
system. However, he does show how 
to evaluate the various possible trade¬ 
offs quantitatively. 

Przybylski complements his analysis 
with simulations. He uses a sophis¬ 
ticated simulator and a variety of 
program traces to generate numeri¬ 
cal results of “experiments.” 

Computer system designers and 
computer science students will find 
this book interesting. The book is 
not for the casual reader, but Przybylski 
provides separate summaries of 
findings for readers who wish to avoid 
the gory details. 

The Elements of Spreadsheet Style, 

John M. Nevison (Simon & Schuster, 
New York, 1987, 212 pp., $12.95.) 

I used to worry that we were mov¬ 
ing backward. Just as structured pro¬ 
gramming had begun to rationalize 
mainframe programming, the personal 
computer created large hordes of Ba¬ 
sic programmers and spreadsheet 
writers. These mass programming me¬ 
dia lacked the tools for systematic pro¬ 
gramming. Practitioners of the new arts 


reinvented spaghetti code, data en¬ 
tangled with programs, comment-free 
listings, and trial-and-error design. 
Modularity went out the window. What 
a mess! 

Things have progressed. Basic has 
given way to C—an improvement, 
but not a solution—and now John 
Nevison has stepped forth to try to 
enlighten the spreadsheet hacker. 

Like The Elements of Programming 
Style by Kernighan and Plaugher, a 
classic from the early 1970s, Nevison’s 
book was inspired by The Elements 
of Style by Strunk and White. As an 
admirer of both of those works, I 
began to read Nevison’s book with 
interest and enthusiasm. 

Nevison sets forth 22 rules of 
spreadsheet design. He devotes most 
of the book to justifying and illus¬ 
trating these rules. He also talks a 
little about reusing spreadsheets and 
building collections of spreadsheets. 
I don't think that most spreadsheet 
programs contain adequate tools to 
support a truly modular approach, 
but I’m glad that someone is address¬ 
ing the issue. 

Nevison’s rules are 

1. Make a formal introduction 

2. Title to tell 

3. Declare the model’s 
purpose 

4. Give clear instructions 

5. Reference critical ideas 

6. Map the contents 

7. Identify the data 

8. Surface and label every 
assumption 

9. Model to explain 

10. Point to the right source 

11. First design on paper 

12. Test and edit 

13. Keep it visible 

14. Space so the spreadsheet 
can be easily read 

15. Give a new function a new 
area 

16. Report to your reader 

17. Graph to illuminate 

18. Import with care 


19. Verify critical work 

20. Control all macros 

21. Focus the model’s activity 

22. Enter carefully. 

Some of these rules don’t mean 
what you might think, so you’ll have 
to read the book to get full benefit 
from the list. For example, the 22nd 
rule has nothing to do with typing 
errors. Nevison is telling you to control 
the entry of data into submodels, just 
as you would declare subroutine 
parameters in a C program. 

Nevison illustrates these rules with 
actual spreadsheets. This could be 
incredibly boring, but he avoids that 
problem with humor. He bases his 
spreadsheets on nursery rhyme themes 
and embeds little historical puzzles 
in them. 

If you write a lot of spreadsheets, 
or if you don’t because they don’t 
seem to apply to your problems, take 
a look at this book. It’s worth some 
study. 
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Quality improvement 


ome Japanese companies manage the 
software development process better than 
comparable US companies. In addition, 
numerical and nonnumerical software develop¬ 
ers need more interaction. 

These major conclusions surfaced at last 
winter’s Second International Workshop on 
Software Quality Improvement in Kyoto, Japan. 
I summarize these meetings from the perspec¬ 
tive of a numerical analyst. 

Goals 

Sponsored jointly by the Ministry of Interna¬ 
tional Trade and Industry (MITI), the Joint Sys¬ 
tem Development Corporation (JSD), and the 
Research Institute for Software Engineering 
(RISE), the workshop brought together almost 
60 scientists. About a dozen came from the United 
States, one from Italy, and the remainder from 
Japan. Koji Torii of Osaka University and Victor 
Basili from the University of Maryland jointly or¬ 
ganized the workshop. 

The official goal of the Software Quality 
workshop was to promote the use of quantita¬ 
tive and qualitative evaluation. This use will help 
us understand the effects of various software 
processes and how to improve software mea¬ 
surement techniques. In 1990 billions of dollars 
were spent on computer software; the US Navy 
alone spent $10 billion. In the future this figure 
will rise. Logically therefore, we must conduct 
research in methods that can lead to reduced 
rates of increase in these costs. All large tech¬ 
nology organizations face a similar challenge. 

KRP 

The workshop was held at the Kyoto Research 
Park, a new and spectacularly modern facility 
about 15 minutes by taxi from the Kyoto train 


station. KRP bills itself as a “self-contained 
laboratory complex with all the benefits of a 
modern city.” Apparently with help from the city 
of Kyoto, KRP continues to construct and man¬ 
age this large facility, which commenced opera¬ 
tion in the fall of 1989. 

The sessions offered simultaneous translation 
in Japanese and English so participants could 
use the most natural language. (The wireless 
receivers were German.) Last year’s English-only 
workshop proved to be a deterrent to many of 
the Japanese participants. 

JSD 

About 100 related businesses and 19 major 
information processing companies founded JSD 
in 1976. JSD seeks to improve technical standards 
and to strengthen the Japanese information pro¬ 
cessing industry. JSD focuses on national projects, 
development of a paperless system for the Japa¬ 
nese Patent Office, distribution of the Spider 
image-processing package, cooperation with the 
Small Business Promotion Corporation, and 
software sales promotions. 

Observations 

As a scientist who has been developing soft¬ 
ware throughout my career, I realized after lis¬ 
tening for two days that the issues addressed 
here apply to a very different class of problems 
than those faced by myself and my peers. The 
best sense of this came to me while listening to 
the speaker from Mitsubishi. M. Masuda claimed 
that his company produces more than 34 million 
lines of software (code) each year. Most of this 
is developed to run industrial processes such as 
manufacturing production lines. 

The main questions associated with this 
software are not whether the algorithm is clever, 
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efficient, or creative. They concern how 
to reduce the number of potential errors 
that large, multiperson software projects 
can produce. In fact the most important 
parameter discussed at this workshop 
was BKLOC, or bugs (errors) per 
thousand lines of code. This commu¬ 
nity appears to have little room for 
the traditional questions about nu¬ 
merical methods that physicists, 
chemists, and engineers raise when 
thinking about computer programs. 

I concluded first that a general per¬ 
ception existed that the Japanese 
(companies) were doing a better job 
than their US counterparts in reducing 
error rates. Basili claimed that Japanese 
BKLOC rates were about one order- 
of-magnitude less than those in the US, 
down below 1 BKLOC. Further, he said 
that Japanese software productivity 
was from two to 20 times that in the 
US. He claimed the major reasons were 
the Japanese companies’ better man¬ 
agement techniques and better meth¬ 
ods of motivating people. (After 
listening to several talks by represen¬ 
tatives of Japanese industry, I concurred 
with this assessment.) 

Nevertheless, the gap between in¬ 
dustry and research was greater in 
Japan than in the US. Basili felt entry- 
level Japanese software developers are 
less well prepared than their US 
counterparts. But it reverses after about 
five years of work. Japanese compa¬ 
nies do a good job of training and 
retraining workers, so perhaps this 
accounts for part of the turnaround 
effect. (On a personal level I have been 
amazed to discover courses in Unix, 
mathematics beyond calculus, and 
statistics on the educational channel 
of our home TV during prime time.) 
Cohost Torii echoed the remarks about 
the narrow pipeline between Japanese 
university research and industry. 

Secondly, I was surprised by the 
meager attendance and intellectual 
presence of US industry, even con¬ 
sidering the distance to Kyoto. W. 
Agresti from Mitre Corporation dis¬ 
cussed the ways to assure quality in 


software being developed by others. 
M. Deutsch (Hughes Aircraft) discussed 
an after-the-fact analysis he had done 
of some of Hughes’ projects while he 
was on sabbatical. F. McGarry (NASA) 
discussed work done to a large extent 
in collaboration with Basili at Maryland. 
Representatives of CTA Inc., a small 
Maryland contractor, and MCC, a Texas 
software think tank, also presented 
papers. 

Pictographic kanji 
lets readers grasp 
the essence of a 
page more 
quickly. 

Some other American companies 
attended but presented no papers. 
Representatives from AT&T Bell Labs 
(Columbus, Ohio, office only) and Gen 
Rad in Massachusetts wanted to learn 
more about Quality Function De¬ 
ployment, a method developed by 
Tadashi Yoshizawa of Tsukuba Uni¬ 
versity. QFD is well known and has 
been promoted by Yoshizawa since 
the 1970s. M. Ohba from IBM Japan 
spoke only about modeling the errors 
that were discovered after software 
was delivered. (I questioned his use 
of linear differential equations as being 
simplistic.) 

On the other hand, Japanese industry 
was well represented. Oki Electric, 
NEC, Mitsubishi, Fujitsu, Nippon Steel, 
Sharp, and JSD papers gave very clear 
evidence that these companies try hard 
to learn how to improve software 
productivity. Papers by university re¬ 
searchers from both countries were 
mostly excellent. 

I was struck by the fact that the 
Japanese really work at reducing 
software errors. They strongly em¬ 
phasize user requirements driving the 


design process (natural in any com¬ 
pany that produces a product), a team 
approach to solving problems, de¬ 
veloping methods that assure quality 
at every step of the software process, 
and quality control involving all de¬ 
partments including top management. 
The speaker from Nippon Steel said 
it very well, “We put stress on hu¬ 
man motivation as well as machine’s 
automation.” 

Selected details 

The Tetsuo Tamai (Tsukuba Uni¬ 
versity) paper on Japanese-based pro¬ 
gramming tools generated a substantial 
amount of discussion. Essentially, 
Tamai studied the existing program¬ 
ming languages that use Japanese in 
some direct way and attempted to 
compare them. The discussion centered 
not so much on his results, which were 
very tentative and modest, but on the 
general usefulness of a Japanese pro¬ 
gramming language. Basili pointed out 
earlier that he had been impressed with 
the strides made in the use of Japa¬ 
nese syllabic input (hiragana) with re¬ 
sulting conversion to kanji. I have 
watched this process, too. Although it 
is slower than using an ordinary al¬ 
phabet, it is only a factor-of-two slower. 
It allows Japanese speakers more 
natural access to computers. 

One of my noncomputer colleagues 
remarked that pictographic kanji per¬ 
mits experienced readers to grasp the 
essence of a page more quickly than 
alphabetic languages. Other icon-based 
software (Sun, Macintosh, and so on) 
are also pictographic and clearly very 
successful. Jun Murai (University of 
Tokyo) wrote recently that Japanese 
network traffic is almost all in kanji. 
Perhaps Japanese prefer to communi¬ 
cate this way. In any case it was 
admitted that Japanese language pro¬ 
gramming would be unmarketable 
outside the country. Consequently re¬ 
search in this direction has never been 
a priority project. 

I believe Japanese-language software 
would provide improved productiv- 
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ity. It would open the door to ama¬ 
teur software developers in much the 
same way that the PC did in the US. 
Further, the lessons learned from de¬ 
veloping Japanese software could then 
be applied to other Asian languages 
for countries whose markets are just 
now opening for computer technology. 
Even the current wave of Japanese 
word processors has already fueled 
very good display technology. A 
Japanese PC on a secretary’s desk has 
much better graphics capability than 
the US counterpart. 

The paper on JSD’s Faset project in¬ 
terested me because it represented one 
of the major projects undertaken by 
this group. I was disappointed with the 
progress they have made. Some other 
listeners seemed to feel similarly. They 
questioned the use of inefficient Lisp 
as an output language. They wondered 
why logic programming (Prolog) was 
not used, and why the tools were not 
better integrated. This work strikes me 
as being so high level that it suffers 
from a lack of concrete focus. 

John Knight (University of Virginia) 
is a recognized expert on safety criti¬ 
cal software, such as those used in 
nuclear power facilities, avionics sys¬ 
tems, or dangerous medical appliances. 
He discussed the problems associated 
with designing software with failure 
rates of 10" 9 per event or per hour. For 
extremely reliable hardware systems 
the usual approach is redundancy— 
shown both by analysis and experi¬ 
ment to be effective. For software the 
same technique is often used. That is, 
different contractors develop software 
with identical specifications. Unfortu¬ 
nately, Knight’s research shows that the 
usual assumption of independence 
between failures of redundant systems 
is often invalid for software. Difficult 
parts of the design are likely to lead to 
errors no matter who codes them. He 
concluded very succinctly, that he was 
“scared to death” when he thinks about 
some of the places such safety-critical 
software are used. 

McGarry’s paper was particularly 
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relevant to me. He provided compari¬ 
sons between software projects using 
Fortran and Ada. He showed that, sur¬ 
prisingly, the percentage of time spent 
in the design, test, and code stages of 
a big project was about the same with 
either language. Further, the error 
profiles were also about the same, al¬ 
though Ada programs had fewer in¬ 
terface errors. The US NASA approach 
to improving software quality lies in 
“understanding” many of the measures 
of software quality that have been de¬ 
veloped over the past 15 years, 
“analysis” or defining relationships 
between the software process and 
software product, and “automation” or 
the development of software tools to 
improve quality. Much of the research 
has been done within the Software 
Engineering Laboratory jointly sup¬ 
ported by NASA, the University of 
Maryland, and Computer Sciences 
Corp. 

T. Yamazaki represented Nippon 
Steel’s view that employee motivation 
and technology are like two wheels 
on the same axle. Both must have equal 
diameters, or straight-line motion to¬ 
ward the company’s goal cannot occur. 

A similar view was expressed by 
Masanori Teramoto. He estimated that 
NEC employs approximately 15,000 
people in 2,500 groups associated with 
software development. They hold 
meetings in production design, soft¬ 
ware quality control, and evaluation 
and self help. The meetings involve 
all levels of company employees, in¬ 
cluding top management, and are very 
goal oriented. 

NEC estimates that in 1991 it will 
write 140 million lines of code. It ex¬ 
pects BKLOC to be less than 0.1. Fur¬ 
ther, its reuse percentage (code that 
can be used in other applications 
without rewriting) will increase from 
50 percent in 1987 to 60 percent in 
1991. 

As a side remark, Teramoto noted 
that Unix helped increase NEC pro¬ 
ductivity. The same system is used 
across a wide variety of computers, and 


hence less relearning was needed. My 
own experience is that Unix is a won¬ 
derful system for software hackers who 
love its flexibility and abbreviated 
command syntax. It is also an awful 
system for the inexperienced who are 
easily confused by these same features. 

Miscellaneous 

An article in Japan Times by John 
Boyd, entitled “Is a hard fall awaiting 
American soft?” presents some conclu¬ 
sions that dovetail with the remarks 
about Japanese word processors. Boyd 
details a speech made by Bill Totten, 
president of Ashisuto K.K., a Japanese 
importer and distributor of foreign 
software. Some excerpts follow: 

The worst mistake (Americans) 
could make was to underesti¬ 
mate the Japanese software 
competition. I think the 
American software industry 
needs to face up to the fact that 
things are changing rapidly. 

And unless they make sub¬ 
stantial changes in the way 
they’re handling their business, 
they’re going to lose this in¬ 
dustry like they lost the televi¬ 
sion, automobile, and a lot of 
other industries. The reasons 
behind the change sound 
frighteningly familiar: high US 
costs and prices compared to 
Japanese offerings; a parochial 
business focus and an unwill¬ 
ingness on the part of many 
US software companies to even 
listen to the needs of overseas 
customers. 

The 2-byte character code is a good 
example. Virtually all Japanese software 
uses these 2-byte codes, which are 
necessary to represent the 3,000 or 
more kanji characters. Several years ago 
IBM issued a new standard to support 
the 2-byte code, but few US software 
companies currently use it because it 
isn’t necessary in the US or Europe. 
Totten says, 
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Japanese companies are com¬ 
ing out with a lot of software 
that is now functionally the 
same as American software, 
but is better manufactured, 

[has] few bugs, more clearly 
documented source code, and 
is smaller and faster...US 
companies are on the verge of 
losing the Japanese market, 
and as Japanese distributors 
grow and try to expand to Asia 
and Europe these markets will 
be lost too. 

My own experience so far is that 
these observations do not yet apply to 
scientific software. I have seen very few 
examples of Japanese engineering 
software in use. In fact, I have repeat¬ 
edly been told that the main reason 
that Japanese scientists like the Cray 
rather than NEC, Fujitsu, or Hitachi 
supercomputers is not because Cray 
computers are faster. It is because so 
many well-tuned engineering and 
analysis packages run on Crays. Un¬ 
less Japanese supercomputers sell well, 
software vendors will not be very mo¬ 
tivated to create versions that are spe¬ 
cific to them. 

In the PC and workstation world 
things are quite different. Any computer 
with a graphical or menu interface is 
likely to am Japanese software because 
of the need to represent kanji. 

[Editorial Board member David 
Kahaner travels in Japan on assign¬ 
ment with the US Office of Naval Re¬ 
search, Far East. His comments are his 
own; they do not express any official 
policy.] 
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Optics: The next wave? 

Positioning itself for an anticipated 
“next wave” in the electronics in¬ 
dustry—electrooptics and integrated 
optics—is Atlanta’s Georgia Institute of 
Technology. Researchers there believe 
a recently completed 100,000-sq ft ad¬ 
dition to the Microelectronics Research 
Center will contribute to Tech’s at¬ 
tempts to understand optic device 
technology. 

The three-story Center building 
contains a “photon corridor” for opti¬ 
cal research (in which researchers pipe 
laser beams), 7,000 sq ft of clean room 
space, a laboratory, and office and 


conference room space for 40 re¬ 
searchers and 80 students. In the pho¬ 
ton corridor, the Center’s optics 
laboratories share more than a million 
dollars’ worth of laser equipment. 

Center researchers—in conjunction 
with industry partnerships—study the 
devices, which communicate to and 
from optical fibers and are too costly 
and risky for one company to develop. 
They investigate ways of using elec¬ 
tron wave effects in efforts to build 
1-billion-bps circuits. Much of the re¬ 
search focuses on developing high-ef¬ 
ficiency solar cells that combine silicon 
with other materials and analog silicon 


Micro bits 


Merit, IBM, and MCI established 
Advanced Network and Service, Inc., 
a not-for-profit organization to man¬ 
age and operate the federally funded 

NSFNET. 

Concerned about trade with the 
European Community? US inter¬ 
ested parties may now telephone 
(301) 921-4164 for a recorded hotline 
message on draft EC laws and stan¬ 
dards that might create technical trade 
barriers. 

Open Software Foundation an¬ 
nounces release of its Unix-based 


operating system, OSF/1. OSF/1 
features built-in multiprocessing, a 
microkernel, and added security. 

On April 8 the US Federal Com¬ 
munications Commission begins 
testing six US, European, and Japa¬ 
nese proposals in efforts to select a 

US HDTV standard by 1993. 

The US National Research Coun¬ 
cil recently created the Scientific 
and Technical Information Board 

to find ways to structure, manage, 
and communicate scientific and re¬ 
search data sets. 
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devices based on neural networks for 
applications in instrumentation. 

Close ties with Tech’s Manufactur¬ 
ing Research Center offer the capabil¬ 
ity for developing the technology and 
moving it into production. This chance 
to work together with exotic materials 
or combine electronics with photonics 
allows researchers to maintain control 
of the complete process and permits 
fine-tuning as needed. “We can fabri¬ 
cate new ICs,” says the Research 
Center’s director, Richard Higgins, “and 
they [manufacturing] can determine 
how to package and assemble them 
into reliable working equipment.” 

RS/6000 leads in performance 

Testing of recent RISC products by 
Workstation Laboratories places the 
IBM RS/6000 performance ratings 
above those of other well-known 
RISCs. The 6000 achieved a rating of 
13,594 Khornerstones in six floating¬ 
point-intensive tests written in C and 
Fortran. The DECstation 5000 came in 
at 10,417 and the Mips Magnum 
(RS3230) at 10,567 Khornerstones. 

As the table below shows, the 6000 
also led in vector and transaction pro¬ 
cessing tests. Vector tests require 
completion of 100,000,000 multiplica¬ 
tion operations using multiple 25 x 25 
matrices. The transaction processing 
test represents multiuser performance 
and is highly disk-intensive. 


Learn about LANs on your 
own time 

A recently released self-study tele¬ 
communications series can help you 
learn how to plan and design a tailored 
local area network system. You can 
learn how to implement a system that 
is already on site or integrate systems 
that may have been inherited through 
a merger or acquisition. 

The six-course series from Science 
Research Associates details the IEEE 
802.2/3/4/5 LAN protocols. The 32210 
courses adhere to the models and 
conventions of the IEEE and the stan¬ 
dard OSI model. Available individually 
or as a series, the courses can be li¬ 
censed for 60 days or one year from 
SRA, 155 North Wacker Drive, Chicago, 
IL 60606-1780. 

Ask for the Local Area Network Ar¬ 
chitectures and Implementation Series. 

Multichip modules 

Application-specific electronic sub- 
assemblies called multichip modules 
may soon find their way into electronic 
equipment applications. MCMs provide 
space-saving interconnection for bare 
or unpackaged semiconductor chips, 
which are then protected by a coating 
or an enclosure. 

A Business Communications Com¬ 
pany, Inc. study predicts that MCMs will 
become an important ingredient for 
high-performance electronic systems 


incorporating CMOS and VLSI tech¬ 
nologies, as well as high-speed sys¬ 
tems. BCC predicts that this market will 
total $200 million in 1990 and $896 
million by 1995. Joseph Castrovilla, 
BCC electronics analyst and study au¬ 
thor, states that these “subassemblies 
will provide solutions to meet the re¬ 
quirements for advanced electronic 
equipment for state-of-the-art compo¬ 
nent density, high speed, and reduced 
size.” 

Contact BCC, 25 Van Zant Street, 
Norwalk, CT 06855, for a copy of the 
$2,650 Multichip Modules in Electron¬ 
ics: New Technologies and Markets. 



Hootman adds to Board 

Editor-in-Chief Joe Hootman recently 
appointed Ashis Khan, to the Editorial 
Board of IEEE Micro to review sub¬ 
mitted manuscripts. As a senior tech¬ 
nical specialist at Mips Computer 
Systems, Khan consults with custom¬ 
ers designing systems based on the 
company’s RISC architecture. He has 
written articles and delivered public 
seminars on that topic. 

Khan earned his MSEE degree at 
State University of New York at Stony 
Brook and his BTech degree at Indian 
Institute of Technology in Kharagpur. 
He is a senior member of the IEEE. 
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Benchmark data. 


Floating-point 


Transaction 

System 

CPU/MHz Khornerstones 

Vector 

processing 

IBM RS/6000/320 

IBM RISC/20 

13,594 

5,882,352 

35.64 

DECstation 5000 

R3000/25 

10,417 

1,677,852 

18.29 

200 cx 





Mips RS3230/ 

R3000/25 

10,567 

2,207,505 

11.96 

Magnum 

Sun Sparcstation 1 

Sparc/20 

3,778 

628,141 

19.15 

Tektronix XD88/10 

88000/20 

5,645 

645,161 

16.42 

Compaq DP486/25 

i486/25 

6,311 

544,366 

18.71 
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Managing Editor, IEEE Micro; PO Box 3014, Los Alamitos, CA 90720-1264. 


Joe Hootman 

University of 
North Dakota 


DSP and related products 

■ Code development enhanced 

DSPlay XL 3.16 code generation software ex¬ 
pands available DSP and related functions by 
more than one third over its previous version. It 
includes a graphic editor, function library, graphic 
displays, filter design programs, assembler, and 
real-time analog I/O support. Users create code 
using block diagrams from the standard library 
or customize their own using the built-in as¬ 
sembler. The provided monitor offers debugging 
features. DSPlay XL 3-16 supports AT&T WE 
DSP32 and DSP32C processors. Burr-Brown 
Corporation; $1,495, available from stock.. 

Reader Service No. 10 

■ Software analyzes ADC performance 

ZPA1000 software evaluates the performance 
of analog-to-digital converters (ADCs) by ana¬ 
lyzing signals in the time and frequency domains, 
and producing linearity measurements. The 
mouse-driven software operates as a digitizing 
oscilloscope, spectrum analyzer, and histogram 
analyzer, accepting analog inputs from 16 bits at 
150 kHz to 12 bits, 10 MHz. The save/load func¬ 
tion evaluates converter performance and system 
parameters at the final test stage. ZPA1000 allows 
for hard-copy printouts on designated graphics 
or laser printers. Burr-Brown Corporation; $995 
in unit quantities from stock. 

Reader Service No. 11 

■ C compiler supports signal processor 

An ANSI C optimizing compiler for Texas In¬ 
struments’ TMS320C25 digital signal processor 
generates assembly language, supports the 
writing of C-level interrupts, and supplies C li¬ 
braries in source form. The compiler, which 
supports the TI assembly format, uses the DSP 


chip to create optimized code. Its companion 
assembler produces relocatable object code, 
listing files, and diagnostic messages. Assembler 
utilities include librarian, cross-reference gen¬ 
erator, and object code converters. The assembler 
supports Hewlett Packard- and Tl-tagged formats, 
and accepts TI assembly mnemonics and direc¬ 
tives. BSO/Tasking; from $1,695for the compiler, 
assembler, linker, and librarian. 

Reader Service No. 12 

■ Midas touch for industrial data 

Measuring 7 inches wide, 10 inches deep, and 
6 inches high, the Midas-150 PC provides in¬ 
dustrial data acquisition and process control. Its 
features include a 16-MHz 286-compatible pro¬ 
cessor, 1 Mbyte of RAM, three storage options 
(1.44 Mbyte, 3.5-inch floppy; 40-Mbyte hard drive; 
or semiconductor RAM disk), a seven-slot passive 
backplane, and integral field wiring screw ter¬ 
minations. The Midas-150 uses compatible pro¬ 
cess control software, such as Genesis, LT/ 
Control, and Lab Tech Notebook. ABAC Corpo¬ 
ration; from $3,200. 

Reader Service No. 13 
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■ A/D board offers 100-kHz 
transfers 

The DT2812 analog and digital I/O 
board conducts direct memory access 
(DMA) transfers at rates up to 60 kHz. 
Two digital-analog converters (DAC), 
three 16-bit counter/timers, program¬ 
mable pacer clock, and interrupt sup¬ 
ports allow users to work in scientific 
and industrial applications. The 
DT2812-A board offers maximum 
digital-to-analog and analog-to-digital 
transfer rates of 100 kHz. Both boards 
are IBM compatible. Data Transla¬ 
tion; $629 OEM (DT2812), $695 OEM 
(DT2812-A). 

Reader Service No. 14 


■ DSP program features 
graphics 

A DSP analysis software package 
with accompanying graphics capability 
performs calculations and analysis us¬ 
ing one-dimensional fast Fourier 
transforms. The program—called Fou¬ 
rier Perspective III—features digital and 
median window filtering, linear systems 
functions, windowing, convolution and 
correlations, and a number of 
mathematic functions for conducting 
analysis. Fourier’s Data View saves 
graphics in encapsulated Postscript and 
other formats, and it outputs to a range 
of printers and plotters. Alligator 
Technologies; $695, available one week 
ARO. 

Reader Service No. 15 



Fourier Perspective III software system 


■ Triggers set aim for analysis 

The triggering system of the R380 
spectrum analyzer and digital oscillo¬ 


scope permits precise setting levels 
using on-screen tools or remote con¬ 
trol. Connected to a serial port, the R380 
hardware includes two 14-bit A/D 
channels; input signal gain ranges; and 
a signal cursor and two markers to 
measure voltage, time difference, and 
frequency. Rapid Systems; $1,995. 

Reader Service No. 16 

■ Voice processing unit for XTs 

A four-port voice processing com¬ 
ponent allows developers to construct 
DSP-based systems on 8088 and 8086 
platforms. The new IBM XT-compat¬ 
ible RDSP/4108 uses an 8-bit hardware 
interface to the host computer. The 
RDSP/4108 can work with the 16-bit 
RDSP/4208 voice processing unit in an 
80286 or 80386 host computer. 
Rhetorex; $995. 

Reader Service No. 17 

■ DSP board available for Next 

A DSP-based board for the Next 
computer features five 27-MHz digital 
signal processors to execute audio and 
communications applications. Four of 
the DSP56001 chips on this Quint 
Processor board function as slave 
processors to perform computation; the 
fifth operates as a I/O processor that 
manages dynamic RAM, SCSI mass 
storage, and interprocessor communi¬ 
cations. The Quint Processor contains 
five DSP ports for analog and digital 
I/O. Ariel Corporation; $6,995, avail¬ 
able in 30 days. 

Reader Service No. 18 



Quint Processor DSP board 


■ Coprocessor cards support 
Suns 

Two coprocessor cards operating at 
13.3 MIPS (million instructions per 
second) conduct DSP and I/O for Sun 
Microsystems’ Sparcstation and S-bus- 
compatible computers. The S-56 and 
S-56X cards, bundled with debugging 
and library tools, process real-time tasks 
in Unix-based environments. They also 
include up to 192 Kbytes of zero-wait- 
state memory and a Next-compatible 
DSP port. Ariel Corporation; from 
$2,495 or $1,495 OEM (S-56), $2,995 
or $1,995 OEM (S-56X). 

Reader Service No. 19 

■ Self-calibrating converter 
adjusts for errors 

A 12-bit, plus-sign, analog-to-digital 
converter calibrates itself to adjust lin¬ 
earity and zero errors. According to the 
company, the ADC1251 corrects inter¬ 
nal errors, ensuring no missing codes 
over temperature with zero errors of 
±1 LSB (last significant bit) and full- 
scale errors of ± 1.5 LSB. Available in a 
24-pin, dual-in-line package, the 
ADC1251 accomplishes conversions in 
8 [is. A sample-and-hold function al¬ 
lows its use in DSP applications. 
National Semiconductor; $17.50 

(ADC1251CIJ, 100s). 

Reader Service No. 20 

■ Driving A/D channels 
simultaneously 

The TSI 98304 coprocessor drives 
A/D and D/A converter channels simul¬ 
taneously up to 20K samples/second 
per channel. These eight 8-bit channels 
work with a TMS320C25 digital signal 
processor with up to 2 Mbytes of dual- 
ported data memory. The memory pro¬ 
vides buffering between the real-time 
I/O system and the time-shared pro¬ 
cesses for an uninterrupted flow of 
data. The TSI 98304 fits the DIO-II slots 
in the HP 9000 series 300 and 400 
workstations. Tetra Systems; $3,995 
(2 Mbytes) or $2,995 (no dual-ported 
memory), delivery 6-8 weeks ARO. 

Reader Service No. 21 
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■ Array processor performs at 
33 Mflops 

Compatible with VMEbus systems, 
the DSP200 general-purpose, floating¬ 
point array processor performs at a 
maximum of 33 Mflops using a Texas 
Instruments CPU. Its P2 connector 
implements a Quad 16-bit I/O bus for 
processor data interchange. This bus 
conducts data transfers at 8 MHz in one 
clock cycle. Twelve-bit converters sup¬ 
port analog I/O, and front-panel con¬ 
nectors support 64 bits of digital 
I/O. Olsson Engineering; $8,495, de¬ 
livery 60 days ARO. 

Reader Service No. 22 

■ Converter/amplifier digitizes 
input signals 

A T/H amplifier combines with a 
sampling A/D converter to digitize static 
and dynamic analog input signals. The 
MN6774, based on the MN774 12-bit, 
8 (is A/D converter, targets military, 
aerospace, and industrial applications. 
J, K, S, and T models address differing 
levels of performance, including 
unipolar error, zero error, and abso¬ 
lute accuracy specifications. Micro 
Networks; from $105 (100s); delivery 
in 8-12 weeks. 

Reader Service No. 23 

■ Card offers 12-bit resolution 

A full-size data acquisition board 
offers 12-bit resolution for both A/D 
and D/A conversions. The R812G in¬ 
cludes software-selectable A/D input 
ranges, built-in interrupt and direct 
memory access, two D/A outputs, and 
programmable gains of 1, 2, 4, 8, and 
16. The card can execute the following 
applications: temperature, pressure, 
proximity, level, height, distance, and 
force. Rapid Systems; $595. 

Reader Service No. 24 

■ Interface reduces noise 

Designed for applications requiring 
exact analog voltages, the DA700 in¬ 
terface features six or eight channels 
of 12-bit conversion with closely 
matched digital/analog converters and 


output amplifiers. Each analog signal 
couples with a ground signal, reduc¬ 
ing noise pickup and increasing accu¬ 
racy at output. Developed for the PC/ 
XT/AT bus, DA700 supports jumper- 
selectable unipolar or bipolar opera¬ 
tion. Real Time Devices; $297 (six 
channel) or $359(eight channel), stock 
to two weeks ARO. 

Reader Service No. 25 


Design tools 

■ Analysis software optimizes 
circuit design 

The Clio software package lets de¬ 
signers create a match between design 
and desired performance with auto¬ 
matic circuit optimization. Clio also 
features yield analysis, design centering, 
schematic capture, design specification 
entry, circuit simulation and analysis, 
and results processing and presenta¬ 
tion. The user can control the design, 
analysis, and design processes through 
component menus and mouse opera¬ 
tions on Sun Microsystems and Hewlett 
Packard/Apollo platforms. Electrical 
Engineering Software; $18, 750 (net¬ 
work configuration of four seats). 

Reader Service No. 26 

■ Analog circuit program 
increases plot output 

The 2.30 version of ECA-2 software 
upgrade expands the interactive output 
of multiple, worst-case, and Monte 
Carlo plots. It provides linear and/or 
dB values, phase and/or phase delay, 
and component sweeping to optimize 
the circuit or predict end of life. A built- 
in editor, real-time graphics, and fun¬ 
damental analog simulation capabilities 
are other features. ECA-2 2.50 requires 
a minimum of 256K of RAM and MS- 
DOS 2.0 or later to run on IBM-com¬ 
patible computers. Tatum Labs; $775. 

Reader Service No. 27 

■ Spicing up circuit simulation 

A software program simulates cir¬ 
cuits down to path delays critical for 
40-MHz-or-greater digital clock fre¬ 


quencies in designs for PCBs or ICs. 
Contecspice, based on the public 
domain Spice3Cl circuit simulator, 
provides algorithms for modeling 
coupled lossy transmission lines. This 
mixed-level, mixed-mode program 
targets engineers who design high- 
performance PCBs, digital and analog 
ICs, and device packaging. Contec 
Microelectronics USA; $1,998 (PC) or 
$5,419 (Sun workstation). 

Reader Service No. 28 

■ Schematics generated and 
checked 

The Vutrax-II GES schematic entry 
program includes features for checking 
drawing accuracy and analyzing critical 
unconnected inputs. Users can gen¬ 
erate circuit diagrams in many text 
fonts and line widths, and subcircuits 
and repetitive figures may be boxed 
and duplicated. Vutrax-II interfaces 
with analog and digital simulation 
programs from Tatum Labs and other 
companies. It requires a hard disk with 
4 Mbytes available. Tatum Labs; $495. 

Reader Service No. 29 

■ Design tools for ASIC 

A design kit for application-specific 
ICs offers a range of tools for the Dazix 
design environment on the Sun-4 
workstations. Cell symbols, simula¬ 
tion models, net list and test pattern 
converters, and an ASIC management 
environment come in each Fujitsu kit. 
The available libraries cover tech¬ 
nologies such as complementary metal- 
oxide semiconductor (CMOS), bipolar 
CMOS, and emitter-coupled logic. 
Fujitsu; free to qualified mutual cus¬ 
tomers of Fujitsu and Dazix. 

Reader Service No. 30 

■ Hspice integration 

The Hspice optimizing simulator 
now integrates into the Viewsim/SD 
simulation environment. Tight feed¬ 
back loops between the two compo¬ 
nents’ A/D simulators allow for inter¬ 
active, mixed-signal simulation. Hspice 
generates compatible data for the 
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Viewtrace drawing package for analog 
display, output, and analysis. Hspice 
and its server for Viewsim/SD are 
available for Sun Sparcstation, 
DECstation, IBM RS/6000, and VAX/ 
VMS. Meta-Software. 

Reader Service No. 31 

■ C-based debugging 

The Silos II 90.1 software release is 
a C-based version of the original Silos 
logic and fault simulation program. It 
features two-dimensional interactive 
debugging, time enhancements that 
support Future Net-compatible ASIC 
part libraries, spike simulation, and 
analog behavioral modeling. Users can 
create custom reports using the 90.1’s 
software tools. Simucad; immediate 
delivery. 

Reader Service No. 32 

■ More midlayers in PCB 
program 

Version 2.0 of the Tango-PCB soft¬ 
ware offers component placement as¬ 
sistance (manual, interactive, or 
automatic), polygon fill, graphics sup¬ 
port, and the addition of four midlayers 
for a total of 23 layers. Tango-Route 
2.0 includes autorouting capabilities, 
editable power/ground planes, ex¬ 
panded memory support, and in¬ 
creased maze routing, according to the 
company. Accel Technologies; from 
$495 (individual entry-level tools) to 
$1,695 (Tango-PCB and Tango-Route, 
available from stock. 

Reader Service No. 33 

■ New wave of Pspice 

The Pspice circuit analysis version 
4.04 contains an OS/2 real-time wave¬ 
form viewer that allows viewing of 
output waveforms while running a 
simulation. Other enhancements in¬ 
clude previously defined global pa¬ 
rameters for new expressions, and two 
library files containing seven amplifier 
models and more than 400 SCR and 
triac models. Micro Sim Corporation; 
from $950 to $29,900. 

Reader Service No. 34 
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■ Smart prototype 
development 

Designers using Smartmodels can 
simulate and verify a system-level soft¬ 
ware design before entering the hard¬ 
ware prototype phase. Smartmodels 
interface with Altera Multiple Array 
Matrix (MAX) erasable, programmable 
logic devices (EPLD) and tools to en¬ 
sure accurate modeling of logic delays 
in the compiled EPLD design. 
Smartmodels are compatible with a 
range of systems and workstations. 
Logic Automation; $7,900 (subscription 
service fee per workstation). 

Reader Service No. 35 

■ Microwave Hspice 

A host of microwave components is 
now available on the H9001, the latest 
version of the Hspice optimizing ana¬ 
log circuit simulator. These components 
include lossy transmission lines, 
gcometic GaAs FET (gallium arsenide, 
field-effect transistor) with backgate, 
quasisaturation BJT (bipolar juncture 
transistor), and submicron metal-oxide 
semiconductor FET). The H9001 also 
supports algebraic expressions, mul¬ 
tiple simulation viewing, and a user in¬ 
terface for X Windows and Postscript. 
Meta-Software; immediate availability. 

Reader Service No. 36 

■ Mixing text and CAD 
graphics 

Engineers and architects can attach 
part, size, and cost information to CAD 
elements by using the Versadata/386 
package. A designer can visually verify 
drawing elements to which information 
is attached as well as sort and tally this 
information to generate reports. Versa¬ 
data/386 requires a dual-screen con¬ 
figuration for displaying graphics and 
text; a VersaCAD/386 version 5.4, revi¬ 
sion 7 or greater; and at least 4 Mbytes 
of memory. Computervision; $995. 

Reader S ervice No. 37 

■ 3-D modeling 

The Betasoft-R software program 
analyzes electronic boards for thermal 


reliability. It performs three-dimen¬ 
sional modeling on the complex flow 
and thermal fields by evaluating heat 
conduction, convection, and radiation. 
The menu-driven program outputs 
thermal maps of a board and its com¬ 
ponents. Betasoft-R also comes with a 
library of 2,500 components, and it 
operates on any XT/AT/PS2-compat- 
ible computer using DOS 3.0 or higher. 
Arctos Systems Corporation; $2,395 
(model R-1S, single-sided), $3,595 
(model R, double-sided). 

Reader Service No. 38 


Computers and keyboards 

■ Scanners provide two 
options 

Two bar code scanners collect data 
by different methods. The Dynabar-232 
scanner plugs into a Psion hand-held 
computer, enabling data collectors to 
scan with one hand. The result is a 
combination bar code reader, terminal, 
and RS-232 communications link. 

For industrial, retail, and office ap¬ 
plications with critical power con¬ 
sumption needs, the Welch Allyn bar 
code model offers a press-to-operate 
switch that directly controls power to 
the scanner. Psion. 

Reader Service No. 39 



Dynabar-232 bar scanner 

■ Six microterminals for OEMs 

The 100 and 200 series of CTM 
microterminals provides an operator 
interface or a control panel solution for 
original equipment manufacturers. The 












CTM150 and 170 models come in tran¬ 
sistor-transistor-logic-level RS-232 or 
multidropped RS-422 ports. The 
CTM200 (with RS-232) and CTM220 
(with RS-422) allow access to 16 func¬ 
tions defined by the user. Models 230 
and 270 combine backlit function keys, 
internal beeper, and 32-pin connector 
with the standard features of the 
CTM200 and 220. Burr-Brown Corpo¬ 
ration; from $195 (CTM150) to $295 
(CTM270), available from stock. 

Reader Service No. 40 

■ One-handed data collection 

Featuring a numeric keypad, pro¬ 
grammable function keys, and a touch- 
panel liquid crystal display, the Data 
Mate PHT-60 hand-held terminal is 
available with 256 or 512 Kbytes of 
memory. 

The LCD allows the viewing of 6 
lines x 24 characters, or 128 x 192 
pixels in graphics mode. The terminal 
reads four bar code types (Code 39, 
UPC/EAN/JAN, 2 of 5, and Codabar) 
and communicates with the host com¬ 
puter via a RS-232C serial port. 
Timekeeping Systems. 

Reader Service No. 41 



Data Mate PHT-60 terminal 


■ Hand-held computer 
briefcase 

The Executive Hand-Held Computer 
Kit unzips to reveal a computer and 


applications software on one side, and 
a personal diary/calendar binder on the 
other. The IBM-compatible LZ termi¬ 
nal contains 32 Kbytes of internal RAM 
and 64 Kbytes of internal ROM. 

The kit includes a Lotus 1-2-3-com¬ 
patible spreadsheet, financial programs, 
eight games, and custom design capa¬ 
bilities. It also comes with a 64-Kbyte 
EPROM (erasable, programmable, read¬ 
only memory) data package for off-line 
storage. Psion; $720. 

Reader Service No. 42 
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Executive Hand-Held Computer Kit 

■ Low-power EL displays for 
medical applications 

Solid-state electroluminescent (EL) 
displays now offer power consumption 
ratings comparable to LCDs. A rede¬ 
signed Planar model requires as little 
as 3-6 watts for a typical image of 
waveforms and text found in medical 
instrumentation. 'Hie EL displays offer 
twist-tab packaging. Planar Systems. 

Reader Service No. 43 

■ Touch screen dynamics 

A flat-panel monitor includes an 
EGA-compatible (extended graphics 
adapter) electroluminescent display 
and a high-resolution touch screen. 
According to the company, the El 
Touch's analog screen reads user con¬ 
tacts with the solid-glass sensor even 
with a dirty screen. The monitor fea¬ 
tures a display area of 8 by 5 inches 
and measures 3 inches deep. It plugs 
into an EGA card and requires no 
modifications to software. Micro Touch; 
$1,995 or $1,495 (OEM). 

Reader Service No. 44 


II Alternative mouse device 
includes stylus option 

A touch-sensitive tablet called the 
Unmouse provides another option for 
mouse and trackball users. Running 
one’s finger along the 3 x 4-1/2-inch 
tablet moves the cursor, and pressing 
lightly simulates the clicking of a mouse 
button. Used as a function keypad, the 
Unmouse comes with templates that 
slip under the glass tablet to emulate 
PC function keys. Word Perfect and 
Lotus 1-2-3 templates are included, as 
well as blank templates and software 
for custom keystroke programming. In 
the Unmouse’s Absolute mode, users 
can draw on the tablet with a stylus. 
The Unmouse is compatible with IBM 
machines. Micro Touch; $235. 

Reader Service No. 45 
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M Software emulates 
terminals 

An emulator for PCs running under 
MS-DOS, the Zstem 220 software pro¬ 
vides tenninal emulation on PW2s, PCs, 
XTs, ATs, PS/2s, and compatibles. It 
also emulates a DEC VT220 terminal 
with VT320 enhancements such as sta¬ 
tus line, International Standards Orga¬ 
nization characters, and screen saver. 
The Zstem 220 software includes error- 
free file transfer protocols; printer, 
plotter and network drivers; and a script 
language. Zstem 220 is available from 
Unisys under its Desktop III contract 
with the US Department of Defense, 
or call KEA for ordering information. 
KEA Systems. 

Reader Service No. 46 
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What's #1 in 
the hearts of 
its readers? 

IEEE Micro!! 

How does 
it do it? 

With quality 
articles! 

Who keeps 
the quality 
high? 


The article 
reviewers! 



Become a reviewer and 
join the #1 team. 

Want to help keep IEEE Micro #1? 
Editor-in-Chief Joe Hootman seeks 
more technical reviewers—people 
interested in seeing that the articles 
published in IEEE Micro continue to 
be of the highest quality. The technical 
review process is a crucial step in 
providing readers with correct and 
timely information so they can keep 
up with their ever-changing 
profession. 

Send your professional 
information directly to: 

Joe Hootman 

University of North Dakota 
PO Box 7165 
Grand Forks, ND 58202 
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Single-board computers and 
controllers 

■ Microcontroller allows 
added access 

Measuring 3-1/2 x 4-1/2 inches, the 
Control RII 8031-based microcontroller 
module allows access to the address, 
data, and control bus through newly 
added expansion headers. The mod¬ 
ule also incorporates space for 8 Kbytes 
of on-board static RAM and jumpers 
that allows for alternate processors us¬ 
ing ROM languages. The unit supports 
applications such as data logging, lab 
instruments, robotics, motor and power 
control, and remote monitoring. Sintec; 
$64.95. 

Reader Service No. 47 

■ One-slot board for 
backplanes 

Slot spaces are usually at a premium 
for standard PC/AT bus passive back¬ 
planes. Original equipment manufac¬ 
turers can install the Slot Board/386, 
which occupies one slot instead of 
several. The board includes a 20-MHz, 
32-bit 80386 CPU with a maximum of 
4 Mbytes of RAM as well as I/O and 
disk controllers. Its design serves em¬ 
bedded applications such as point-of 
sale terminals, medical instruments, 
machine control, and diskless work¬ 
stations. Ampro Computers; $1,170 
OEM (100s); available 30 days ARO. 

Reader Service No. 48 

■ Heating up clock rates 

Ice Cap 486 achieves higher clock 
rates by lowering the microprocessor’s 
operating temperature to 0° Celsius and 
controlling the voltage. The ICEC (in¬ 
tegrated circuit, environmentally con¬ 
trolled) component houses this cooling 
technology, allowing a 38-MHz chip to 
register a Landmark Benchmark rating 
of 170, according to the company. The 
Ice Cap 486 operates at 30.6 MIPS 
(million instructions per second) at 38 
MHz. Velox Computer Technology; 
$150 OEM (without microprocessor). 

Reader Service No. 49 


■ Axis management with 
motion controller 

The SRX motion controller manages 
a maximum of eight axes and works 
with a range of computers through the 
RS232 or RS422 serial port. It also 
communicates with PLCs (power-line 
carriers) through 9 bits of user-defined 
I/O lines. Motor drives for stepping, 
linear, or servocontroller functions 
control the number of axis configura¬ 
tions. Optimized for microstepping, the 
SRX controls high-resolution motors up 
to 50,000 steps per revolution with an 
appropriate driver and motor. Oregon 
Micro Systems; from $225 (per axes). 

Reader Service No. 50 

■ Compact AT-size 
motherboard 

The 13-3 x 8.5-inch K386SX 
motherboard accepts up to 8 Mbytes 
of dynamic RAM in a combination of 
256-Kbyte and 1-Mbyte modules. The 
AT-size board includes an Intel 386 SX 
microprocessor, floppy disk controller, 
and LIM (Lotus/Intel/Microsoft/AST 
Research expanded-memory specifi¬ 
cation) 4.0 support. Users can select 8- 
MHz or 16-20-MHz CPU speeds from 
the keyboard. The K386SX board per¬ 
forms at zero wait states with 80-ns 
DRAMs for 20 MHz. Klever Computers; 
$360 (from two to 25 ujiits). 

Reader Service No. 51 
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Manufacturer Model Comments 


R.S.# 


Memories 

Accutek Microcircuit Corp. AK594096, 

AK584096 


International Microelectronic IMP234XX 
Products 


Philips Components- 48F010 

Signetics 


Raytheon Company R29771, 

Semiconductor Div. R29773 


Compact module series features 4-Mbyte DRAM density for 8-bit 80 
data paths in either SIMM or SIP versions with 60- and 100-ns 
access times. A configuration for applications requiring the ninth 
bit for parity is available. 

Family of 4-Mbit, JEDEC ROMs stores PC word processor and 81 

spreadsheet applications. Organized as 512K x 8 bits or 256K x 
16 bits, the battery-powered, CMOS devices access data in 110 ns 
to support microprocessors without wait states. From $4.73 each 
(5,000s). 

One-Mbit, byte-wide Flash memory features sector and chip erase 82 
and overerase protection. Erasable in bulk or in 128 increments, 
the JEDEC device supports start-up functions and system 
operation by keeping a section of code intact. $15 each (1,000s). 

Standard bipolar PROM and power-switched SPROM replace the 83 
R29671 and R29673 devices. The 55-ns, 4,096 x 8 devices come in 
commercial and military temperature-range versions. $9 each 
(100s). 


Processors/control lers 

Intel Corporation 80C186EB 


Mesa Electronics 


6P21 


Metheus Corporation 1100/1200 

series 


Embedded processor designed as a CPU for mobile applications 84 
such as cellular phones retains its status during power-down 
modes. Code-compatible with its 8086 and 80186/80C186 
predecessors, the 16-bit unit also controls data in office automa¬ 
tion products and communications and industrial control 
applications. $16.95 each (1,000s). 

PC bus-compatible coprocessor fits on one 4.2 x 5.5-inch card for 85 
use in I/O-intensive and real-time control applications. Hosts 
communicate through a 1-Kbyte, dual-ported RAM. The 96-Kbyte 
RAM slave processor remains independent of the host bus except 
for communications. Each unit requires 2 Kbytes of host memory 
space for communication and one host interrupt line. $335 each. 

Graphics controllers perform 10 million-pixel/s random vector 86 
draw and polygon fills in excess of 60 million pixels/s. The 1100 
line offers 1,024 x 768 resolution, 4- or 8-bit planes, up to 256 
displayable colors from a palette of 16.7 million, VGA emulation, 
and up to 1 Mbyte of on-board memory. The 1,280 x 1,024, 
noninterlaced monitor support of the 1200 series features up to 2 
Mbytes of on-board memory and VGA pass-through. From 
SI,999. 
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continued from p. 15 

fied that it was only necessary to take the process as far as 
designing a parallel architecture and simulating it. 

The subproject advanced the state of the art in implement¬ 
ing lazy functional languages in a number of ways. It defined 
a parallel model that naturally extends the sequential lazy 
evaluation and preserves the semantics of programs. The 
parallelism information for the model can be determined in 
the compiler using an analysis technique called abstract in¬ 
terpretation. We’ve proved that when the information ob¬ 
tained from the abstract interpretation is used, we retain the 
semantics of the program. We implemented the analysis 
technique in another project. 

We based the parallel execution model strongly on lazy 
evaluation, with some extra features that create subprocesses 
to perform some of the computation. Therefore, the design 
of a parallel machine fell into two fairly distinct halves: a 
machine to support the sequential core of the language, and 
a parallel harness that supports the specifically parallel parts 
of the evaluation model. We designed a sequential abstract 
machine, an improvement on extant abstract machines. We 
then added the extra features determined in Bum 8 to turn it 
into an abstract distributed memory architecture, showing 
how to compile code for it. 

We established that most features of functional languages 
can be implemented on current hardware, thus avoiding the 
need to design any specialized hardware. We also prepared 
a demonstration implementation on a transputer network and 
translated the parallel abstract machine code into transputer 
machine code. 

Functional languages, logic, and object-oriented languages 
require automatic storage allocation and deallocation (gar¬ 
bage collection). Subproject B developed a garbage collec¬ 
tion algorithm 3 and subsequently improved it. 

The efforts in this subproject have led to a complete sys¬ 
tem, from a functional program without parallel constructs to 
an implementation on a parallel machine. This exceeds the 
original goal. 

The logic database approach. The main objective in 
subproject C was to design a parallel deductive database 
machine called the Delta Driven Computer. 9 The DDC is 
specialized to execute deductive requests in parallel by using 
relational operations. All relations reside in main memory. 

We based the execution model on an original technique 
called the Alexander method, 10 which was designed to merge 
recursive views in queries. The advantage of this merge op¬ 
eration is that it produces a set of rules that can be executed 
in a forward-chaining strategy without computing useless 
information. A compilation technique forms the basis for 
this method rather than other techniques based on 
interpretation. 


The advantage is that following the execution model, the 
set of rules can further be compiled into a low-level lan¬ 
guage in which the parallelism is explicit. Each node of the 
DDC machine can directly execute this language, and two 
intermediate languages (Vim and DDCL). 

The DDC machine organization consists of four to 256 
identical nodes connected by a network. It is a “shared- 
nothing” architecture in the sense that no shared memory 
exists. Message-passing techniques accomplish all com¬ 
munications between the nodes. Each DDC node contains a 
general-purpose processor, a special coprocessor called Music, 
a large memory space, and a communication device. The 
Music coprocessor speeds up relational operations (its 
development was partially financed by ESPRIT project 956 
and a national project). 

DDC is currently simulated on a four-processor Unix ma¬ 
chine (SPS7-70). It demonstrates that the compiled approach 
permits the exhibition of parallel code from declarative lan¬ 
guages (SQL and Vim). This simulation very closely resembles 
the real prototype (eight processors with no shared memory) 
on which we’ve started the implementation of the DDC soft¬ 
ware. The characteristics of both the prototype and the simu¬ 
lation permit us to deduce the performance on the future 
DDC machine by performance modeling. 

The challenge in a parallel shared-nothing architecture is 
to provide for production of an efficient parallel code. In 
DDC, we consider this code efficient if the grain of the paral¬ 
lelism is coarse (more that 1,000 instructions between two 
communications). 

The logic + functional approach. Different from the 
other subprojects, subproject D aimed to evaluate the possi¬ 
bility and the advantage of integrating the dominant 
declarative programming styles, logic (L) and functional (F). 

We’ve designed two languages: a first-order L+F inte¬ 
gration, which is both semantically well-defined and ef¬ 
ficiently implementable by a single computational mechanism 
for sequential and parallel implementations. The corre¬ 
sponding language K-Leaf, based on Horn Clause Logic 
with Equality, extends pure Prolog to express nontermi¬ 
nating, conditional term-rewriting systems with constructors. 

We defined Ideal (an Ideal DEductive and Applicative 
Language), the first compiled, higher-order L+F language, as 
a user language of very high level offerings. Beside the usual 
Prolog capabilities, it offers the most distinctive features, 
now present in modem functional languages: lazy evaluation, 
type inference, and higher order constructs. 

Next, we devoted effort to the efficient implementation of 
the integrated computational model on generic sequential 
processors and on parallel architectures. We designed a con¬ 
servative extension of the WAM (Warren Abstract Machine) 
called K-WAM, to efficiently implement outermost resolu¬ 
tion. K-WAM, based on the extensive experience available 
for sequential and parallel Prolog compilation, is suited to 
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incorporate all the new incoming optimizations. Extensions 
to support the dynamic resolution strategy required by K- 
Leaf thus allowed the first compiled implementation of a sound 
L+F language. 

A strong synergy with ESPRIT project 26, which covered 
the architectural aspects, enabled the implementation of 
an L+F language on a real parallel machine. The machine 
embodies three main concepts: physically distributed/logically 
shared memory, very fast context switching built into the 
processing element, and support of efficient packet-switched, 
nonlocal communications. The machine is an ensemble of 
up to 128 transputers fully interconnected by a low-latency 
(2-ps transit time), high throughput (10 Mbytes/s on each 
port), “cut-through” packet-switching, and a buffered Delta 
network implemented by means of a custom VLSI switch¬ 
ing element circuit containing about 30,000 CMOS (comp¬ 
lementary metal-oxide simiconductor) devices. 

We found parallel execution models for L+F languages to 
be viable on distributed-memory architectures. For efficiency 
reasons, the programmer controls the granularity of parallel¬ 
ism, through special annotations and setlike constructs in a 
disciplined way. We prototyped a (deterministic And)-Or 
parallel model reduction. We finally ported a restriction of it, 
(independent And)-Or parallelism, to our parallel machine. 
We accomplished the porting by means of an original map¬ 
ping of And to Or parallelism, as, independently, experimented 
with success in the Giga-Lips Project. 

We assembled most of the above-mentioned results in a 
working prototype of a small-configuration machine (16 pro¬ 
cessing elements). This machine runs conventional bench¬ 
marks (N-queens, Fibonacci numbers) and a few more realistic 
applications (grammar-based image recognition, logic simu¬ 
lation/fault finding). Written in And-Or parallel K-Leaf, they 
show quasilinear speed-ups at 75-90 percent of the ideal 
efficiency. 

The measured performance of the mapping sequential K- 
WAM into C, on standard benchmarks is: 

• 80-300 percent of Quintus Prolog, 

• 50-200 percent of G-Machine-based Lazy ML, and 

• 20-80 percent of C used with all of its imperative 
features. 

This experience suggests that a reasonably good performance 
(160 KLIPS—thousands of logical inferences per second—on 
a Sun 3/280) can be obtained. 11 

The dataflow approach. The use of the dataflow prin¬ 
ciple for parallel execution has a long history. Since the early 
seventies, researchers have proposed a large number of 
dataflow computer architectures, and some of them have even 
been built, for example, the Manchester Data Flow Machine 
at Manchester University. But most of these systems exploit 
fine-grain parallelism and run number-crunching programs, 


thus leading to expensive special-purpose hardware. In con¬ 
trast, subproject E at Stollmann GmbH intended to build a 
dynamic dataflow machine suited to run commercial appli¬ 
cations and to exploit coarse and coarsest granularity. 

Stollmann chose a database application, the parallel evalu¬ 
ation of database queries. Because of our concept of variable 
granularity, designers could deviate from a highly special¬ 
ized hardware approach and start by building a demonstra¬ 
tor with off-the-shelf hardware. The dataflow control 
mechanisms are lifted to the software level, including distrib¬ 
uted parallel firing. The basic principles of the approach are 
load-distribution mechanisms aiming at exploitation of local¬ 
ity and dynamic load-balancing mechanisms such as task 
attraction. 

The designers formed an abstract architecture model, the 
Stollmann Data Flow Machine. The SDFM units act as soft¬ 
ware processes. The execution model consists of three dif¬ 
ferent units. Execution units execute the fired dataflow nodes, 
that is, the executable instructions, and produce output oper¬ 
ands. The firing control units assign the produced output 
operands to the corresponding instructions and fire the ex¬ 
ecutable instructions. The administration control unit perfonns 
a special function; it monitors the system, starts and termi¬ 
nates programs, and executes input and output instructions. 

These units communicate asynchronously via buckets, a 
special kind of queue with a form of intelligence and access 
strategies. The buckets distribute messages to exploit local¬ 
ity; the work must be done by the unit in which the corre¬ 
sponding data are stored. Moreover, the system provides 
dynamic load balancing to prevent exploitation of locality to 
lead to nonoptimal load distribution. When a unit runs out of 
work, it attracts work from other units. 

In programming the SDFM, designers used the Blass and 
Clan languages. A program written in Blass can be seen as 
the textual representation of a dataflow graph. Clan is a func¬ 
tional, single-assignment language similar to Sisal. To sup¬ 
port coarse-grain dataflow, designers introduced the concept 
of user-defined instructions into both languages. 

We implemented the SDFM model on our demonstrator, a 
multiprocessor configuration with four processor boards 
connected by a VMEbus. For the first implementation we 
chose 68020 processor boards equipped with dual-ported 
RAM, thus admitting a global address space. One processor 
board runs Unix Version 3 and acts as a host. The other 
boards run Srtx, a real-time operating system kernel with 
multitasking and dynamic memory management. Smocs, a 
common layer to all processor boards, provides a basic set of 
operating system functions for a multiprocessor system, es¬ 
pecially global queues and management of the local, but 
shared, memory. 

Clan realized our application, the parallel evaluation of 
database queries, by implementing the relational algebra 
operations as user-defined instructions. These operations 
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Table 3. Languages and architectures of the subproject systems. 

Subproject 

Language 

Architecture 

No. of nodes 

A 

POOL* 

Pooma: Packet-switching 

100 

B 

Miranda 

emulation on a transputer network 

4 

C 

SQL, Vim* 

DDC: Bus 

8 

D 

Ideal* 

Transputer, multistage network 

16 

E 

Sisal, Clan* 

SDFM: Bus emulation 

4 

F 

Lop* 

SDFM: Bus emulation 

4 

*New language 





processing complete relations split into parallel package ver¬ 
sions to introduce more parallelism. Our performance mea¬ 
surements show that speed-up depends largely on the grain 
size. An acceptable speed-up can be gained by a grain size 
of 0.1 seconds. Though there is no one bottleneck, optimiza¬ 
tion of the system to reach an optimal grain size of 0.05 
seconds is possible as is even finer grains by introducing 
special hardware. 

The connection approach to logic programming. The 

main goal of subproject F was an inference machine based 
on the parallel, automated theorem prover Partheo 3 for full 
first-order predicate logic. The underlying proof calculus is 
model elimination, a specialization of the connection method 
developed by W. Bibel. The input language Lop for Partheo 
allows a straightforward declarative style of logic program¬ 
ming without being restricted to Horn Clause logic as in Prolog. 

The second goal of subproject F was to improve and 
implement the functional parallel programming language 
called FP2. FP2 provides parallel processes based on term 
rewriting and communication via unification. It is used as a 
high-level specification tool for the parallel algorithms used 
in Partheo. 

LIFIA, in Grenoble, France, developed the language FP2 
and provides the sequential implementation of FP2. The group 
at the Technical University of Munich worked on design and 
implementation of Partheo and Lop. The Nixdorf part of the 
project covered the parallel implementation of FP2, some 
contributions to the work on Partheo, and exploitation of 
project results. 

Setheo, the Sequential Theorem Prover, extends the 
Warren Abstract Prolog Machine to full first-order predicate 
logic. Setheo, implemented in C, yields a performance of 
about 120 KLIPS on a Sun 4 machine. On Unix machines, 
Setheo proved to be one of the fastest existing, high-perfor¬ 
mance theorem provers as well as an efficient Lop interpreter. 

Partheo, the Parallel Theorem Prover, runs on a network 
of 16 transputers. It solves independent parts of the search 


tree in parallel. Together with the fast abstract machine of 
Setheo, the parallel prover increases the performance of the 
inference machine. An easy-to-use graphical user interface 
facilitates the development and testing of Lop programs. 

A sequential FP2 interpreter running on Sun workstations 
represents a powerful high-level language for specification, 
verification, and test of parallel algorithms. 

We implemented a high-performance FP2 interpreter* on a 
parallel VMEbus machine (the Stollmann test machine built 
by subproject E). 

An evaluation. When project 415 finished in 1989, each 
of the subprojects had delivered a prototype parallel system, 
executing programs in a language for the particular program¬ 
ming style. Table 3 summarizes these results. The design of a 
number of new languages (indicated by the asterisk in the 
table) reflects the improved understanding of the semantics 
of concurrency and its inclusion in the various symbolic pro¬ 
gramming styles. 

We knew the models of computation for sequential sym¬ 
bolic languages were quite different from imperative lan¬ 
guages. Therefore we assumed efficient implementations 
would require special hardware support, such as special CPUs 
or dedicated coprocessors. We directed quite some efforts to 
the design of abstract operational models to support the se¬ 
mantics work and to guide the implementations. Interest¬ 
ingly, a good similarity exists in these models, which view a 
program as a collection of processes and employ message 
passing as the means to achieve communication and syn¬ 
chronization. A process can implement an object (as in POOL) 
or a reduction task that evaluates function-arguments (as in 
Miranda). Processes and their communication patterns are 
either statically determined or dynamically created (POOL, 
Miranda, Lop). The latter case requires a runtime system to 
manage the allocation and scheduling of the required 
resources. 

Thus we could cleanly separate the issues of parallelism 
and of symbolic execution (within the otherwise sequential 
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processes). Our efforts led to several novel compilation tech¬ 
niques that result in efficient mappings of the diverse high- 
level programming languages on standard von Neumann 
microprocessor architectures, thus falsifying the hypothesis 
on special hardware support. Our implementation of POOL 
on sequential machines, for example, approaches the perfor¬ 
mance of an equivalent C program within a factor of two. 

This important conclusion means that parallel symbolic 
computers can directly benefit from the technology improve¬ 
ments in standard microprocessors, and that major invest¬ 
ments in VLSI development for processors are not necessary. 

A second conclusion is that the operational models allevi¬ 
ate notions of sharing. Thus they match a loosely coupled 
architecture, which may consist of self-contained computers 
augmented with communication facilities. This also means 
that we restrict extra hardware facilities to the support of 
communication between processes. 

End-to-end routing devices that guarantee deadlock-free 
operation in sparsely connected topologies form the major 
innovation. These devices live with the dynamic creation of 
processes and communication patterns, and offload the 
transmission overhead from the CPU. The Pooma communi¬ 
cation processor employs a packet-switching method and 
distinguishes itself from other devices (chiefly, the iPSC) by 
very efficiently handling short messages between objects. 12 

At least two areas of research to further improve the effi¬ 
ciency of execution deserve further attention. First is the is¬ 
sue of granularity. Related to the nature of symbolic data and 
operations, the processes tend to be very lightweight (sev¬ 
eral orders-of-magnitude smaller than in Unix) and require 
inexpensive mechanisms for scheduling and resource alloca¬ 
tion. Second, when many processes reside on the same node, 
a “postman” device could place incoming messages in the 
right queues and perform the administration tasks. This de¬ 
vice would reduce communication costs further. 

Our final observations relate to the parallel applications 
that were investigated in the project. They indicate that paral¬ 
lel implementations of symbolic algorithms can be expressed 
very well in the newly designed languages, and that we can 
achieve good speed-up, absolute performance, and 
expandability. 

Pooma 

The following describes in more detail the developments 
in ESPRIT subproject A. This subproject included the parallel 
object-oriented language POOL; 13 the parallel object-oriented 
machine Pooma; and applications written in this language 
and running on this machine. 14 ' 17 

POOL allows the programmer to express parallelism ex¬ 
plicitly by specifying objects that run in parallel with each 
other and communicate via messages. Note that this work 
has been carried out in a close cooperation between Project 
415-A and Prisma, a Dutch national project. Philips Research 


Laboratories worked together with the Centre for Mathemat¬ 
ics and Computer Science in Amsterdam and the universities 
of Amsterdam, Leiden, Twente, and Utrecht. 15 

Pooma is a machine with a loosely coupled MIMD 
(multiple-instruction, multiple-data) architecture. 18 It is com¬ 
posed of many nodes, each of which consists of a data pro¬ 
cessor, memory, a communication processor, and I/O devices. 
The communication processors interconnect with serial, bi¬ 
directional, point-to-point connections (links). The network 
of communication processors provides an end-to-end, dead- 
lock-free, packet-delivery service to the data processors of 
Pooma. 

The POOL language model and the Pooma machine model 
correspond closely. Each node executes a number of objects, 
which communicate by messages that are sent over the net¬ 
work of communication processors in the fonn of packets. A 
compiler and an operating system (runtime support) map a 
POOL program onto the architecture. 19 The operating system 
directly supports processes and messages as POOL requires. 

We aimed at symbolic processing, numerical computing, 
and data-intensive applications such as databases and docu¬ 
ment retrieval systems. Currently we’ve placed the most em¬ 
phasis on data-intensive applications for the office 
environment. 14 

Language 

We directed the effort in language design at exploiting the 
ideas of object-oriented programming to support the efficient 
programming of highly parallel systems. We did not intend 
the resulting language (POOL) to be a tool for rapid 
prototyping but a language for the systematic construction of 
large, reliable systems. We outline POOL here. 

Objects. In object-oriented programming, programmers 
view an executing program as a collection of objects. An object 
is an integrated unit of data and procedures that can act on 
these data. Variables store data; the procedures belonging to 
an object are called methods. Each variable can store a refer¬ 
ence to an object. The data of an object are not directly ac¬ 
cessible to any other object: Objects can only interact by 
exchanging messages. 

A special characteristic of POOL is that, as soon as an ob¬ 
ject is created, it starts executing its body , a local process. 
Different objects execute in parallel and may interact by ex¬ 
plicit sending and answering of messages. Within an object 
all activities occur sequentially; having parallelism inside ob¬ 
jects would cause problems by necessitating additional 
mechanisms to synchronize processes. 13 

Classes. Objects are created by other objects. An object 
can modify its own data and can have its own independent 
internal activity. To describe this unbounded number of dy¬ 
namic objects in a static, finite program, the programmer 
defines objects by grouping them into classes. All the ob¬ 
jects in one class (the instances of the class) have exactly 
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the same structure. That is, they have the same names of 
variables (though each one has its own set of variables), 
and they use the same methods to respond to messages. A 
class def .nition contains all information that is relevant to 
the insta ices of the class and serves as a blueprint for the 
creation if new instances. POOL provides class parameter¬ 
ization, allowing programmers to define generic classes (for 
example a generic class describing arbitrary stacks). Differ¬ 
ent instai ices of such a generic class can be created by fill¬ 
ing in a concrete class name for the parameters (a stack of 
integers) 

We do not consider classes to be objects themselves, to 
which or e could send messages, as is the case in Smalltalk 
80. Neve theless, we clearly need activities that are associ¬ 
ated witf a class, instead of with an object (the creation of 
new objects). Therefore POOL introduces the concept of 
routines. A routine is a procedure that is associated with a 
class (un! ike a method, which is associated with a particular 
object). Soutines can be executed in principle by any object 
in the system, or even by several objects at the same time. 

POOL ises a strong typing mechanism. With each variable 
a class, its type, is associated; it may only refer to objects of 
that clas;. Every expression in the language has a type, 
which indicates the class of objects it can deliver. This capa¬ 


bility makes it possible to detect many programming errors 
before tf e program is executed and facilitates compiler 
optimizat ons. 

Comnalunication. An object indicates explicitly to which 
destinatic n object it wants to send a message. Executing the 
statement v ! put(56) sends a message identifying the method 
(put) to tl te object whose reference is contained in the vari¬ 
able v. Tf e message also contains the integer parameter 56. 
The sender waits until the receiver processes the message. 

Through the statement ANSWER (put, get) the receiver 
indicates that it wants to answer a message that indicates 
either the put or get method. The receiver takes the first 
such message to arrive. If no such message is available when 
the Answsr statement executes, the object waits. 

Another provision, the Conditional answer statement, (for 
example, CONDANS (put, get) ) does not block the execut¬ 
ing objec: when no matching message is found. When a 
message s answered, the object executes the correspond¬ 
ing method, providing it with the parameters in the mes¬ 
sage. At a certain moment (not necessarily the end), this 
method r ;turns a result to the sending object, so that this 
object can continue its activities. The period from the start 
of the me thod execution and the returning of the result is 
called ret dezvous. 

Apart from this synchronous communication mechanism, 
indicated by the “!” sign—in which the sender waits until 
the destination returns a result to end the rendezvous— 
asynchro tous communication exists, indicated by the “!!” 
sign. Witl! the latter, the sender does not wait for a result 


but immediately continues its own activities. The method, 
executed by the receiver on accepting the message, does 
not return a result. 

POOL models all program data through objects, even ba¬ 
sic data types such as integers and Boolean notations. Mes¬ 
sage exchanges model all object interactions. To allow for a 
more compact coding, POOL makes available various syn¬ 
tactic “sugar” constructs (abbreviated notations) to its pro¬ 
grammers. For example, the expression number - 1 is 
identical to the send-expression number ! sub(l); syntactic 
sugar constructs exist for most arithmetical notations. The 
expression arraylout] is syntactic sugar for the send expres¬ 
sion array ! getl(out), which retrieves a value from a one¬ 
dimensional array. However, when such a bracket notation 
occurs at the left-hand side of an assignment, it has a differ¬ 
ent meaning. The statement arraytin] := e is equivalent to 
array ! putlfin, e). This expansion mechanism for syntactic 
sugar constructs does not only work for basic classes like 
Int and Array but also works for programmer-defined classes. 

Units. A POOL program consists of a series of units, which 
come in pairs of specification and implementation. An 
implementation unit contains a number of class definitions. 
The corresponding specification unit lists a subset of these 
definitions, which can be used in another unit when the name 
of the used unit appears in the use list. Thus the programmer 
can hide elements explicitly from other units, enhancing 
abstraction and improving reusability of parts of POOL 
programs. 


SPEC UNIT Bounded_Buffer 
CLASS BufferlElement) 

%% FIFO buffer storing instances of class Element. 

ROUTINE new (size : Int) : Buffer(Element) 

%% Creates a new buffer which may contain 'size' 
elements. 

METHOD put (e : Element) : BufferfElement) 

%% Adds the element ‘e’ to the buffer. 

METHOD get ( ) : Element 

%% Extracts the first element from the buffer. 

END Buffer 


Figure 1. Specification unit Bounded_Buffer. 

An example program. Figure 1 contains a simple, sample 
specification unit describing a generic class Buffer, whose 
objects function as a bounded buffer of elements of a given 
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parameter type called Element. Figure 2 presents the corre¬ 
sponding implementation unit, defining the Buffer class. 

IMPL UNIT Bounded_Buffer 

CLASS Buffer(Element) 

NEWPAR (size: Int) 

VAR cont := Array(Element).new(l, size) 
in := 1 
out := 1 
number := 0 

METHOD put (e: Element) : Buffer(Element) 

BEGIN RESULT SELF; 
contlin] := e; 
in := (in // size) + 1; 
number := number + 1 
END put 

METHOD get ( ): Element 
BEGIN RESULT contlout]; 

out := (out // size) + 1; 
number := number - 1 
END get 

BODY DO IF number = 0 THEN ANSWER (put) 

ELSIF number = size THEN ANSWER (get) 
ELSE ANSWER (put, get) 

FI 
OD 
YDOB 
END Buffer 


Figure 2. Implementation unit Bounded_Buffer. 

The Newpar clause specifies the parameters for the rou¬ 
tine New, which creates new objects of this class. This rou¬ 
tine first creates a new object of the class and then sends this 
object an initializing message containing the new parameters. 
The first thing the new object does is answer this initializing 
message. It then initializes the instance variables. After that, 
the new object starts to execute its body. The body consists 
of an infinite loop (the loop condition, which is absent here, 
defaults to true) in which it explicitly answers incoming mes¬ 
sages. Depending on whether the buffer is empty (number 
equals 0) or full (number equals size), or neither empty nor 
full, the object answers methods put or get, or either. If no 
such message arrives, the object waits. 

Figure 3 indicates how the class Buffer can be used. A 
POOL program starts its execution by evaluating in parallel 
the expressions defining the global values of all implementa¬ 
tion units. Here the program evaluates the expressions ini¬ 


tializing the three global values prod, cons, and buff in the 
unit Root, starting an object of each of the classes Buffer, 
Producer, and Consumer. The Producer object then starts 
sending a stream of integers to be stored in the Buffer object, 
which will be emptied by the Consumer object. The program 
terminates when all activity in the system has ceased. In the 
example, termination occurs when the consumer has printed 
all values retrievable from the buffer. 


IMPL UNIT Root 

USE Bounded_Buffer, File_IO 

GLOBAL prod := Producer.new( ) 
cons := Consumer. new( ) 
buff := Buffer(Int).new(5) 

CLASS Producer 
BODY 

FOR i TO 100 DO buff ! put(i) OD 
YDOB 

END Producer 

CLASS Consumer 
BODY 

DO standard_out ! write_Int(buff ! get( )) 
OD 
YDOB 

END Consumer 


Figure 3. Implementation unit Root. 

Hardware architecture 

As mentioned earlier, the loosely coupled MIMD Pooma 
machine consists of a network of computing nodes with dis¬ 
tributed memory. A packet-switching, point-to-point network 
allows nodes to communicate. The available communication 
bandwidth accommodates hundreds of nodes without re¬ 
quiring interconnection links with a very high bandwidth. 

Node architecture. Figure 4 depicts the structure of a 
Pooma node. It consists of a data processor, memory sub¬ 
system, I/O interface, and communication processor. 

The data processor (DP) executes the code of the POOL 
objects residing on the node. As objects are implemented as 
control-flow processes, a normal von Neumann architecture 
suffices for the DP. Each Pooma node can execute several 
(some 10 to 100) processes. The processor architecture, 
therefore, supports multitasking, and efficient process 
switching is a major requirement. 

The memory stores the code and data of the operating 
system (runtime support) as well as the code, stacks, and 
message queues of the objects residing on the node. 
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Figure 4. 


The structure of a Pooma node. 
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interfaces, not necessarily available on all nodes, 
LAN and disk controllers. They can be connected 
the bus of the data processor. We thought this 
to be general enough to handle other types of in- 
a similar fashion. 

decided to connect I/O interfaces to the node 
had further decisions to make. It remains to be 
x^hether to equip all nodes with I/O interfaces, to 
possibility on every node to be extended with 
ces, or to equip only particular nodes with actual 
or provisions for them. Basically, we face a cost/ 
ce trade-off here. For inexpensive, standard I/O 
we think it reasonable to provide every node with 
terface to disk or tape unit, serial terminal connec- 
even Ethernet). For more expensive interfaces a 
$us extension (VMEbus or Multibus) seems a better 

does not require special facilities such as I/O pro- 
some data processors can reasonably spend a part 
servicing I/O interfaces. Every DP can access all 
interfaces by passing messages over the network, 
network provides a very high bandwidth, only its 
be an obstacle. 

mmunication processor. An important aspect of 
tralized computer architecture is its communica- 
4>rt. The CPs in Pooma together establish a commu- 
system by which every data processor can 
.i<tate with every other data processor through packet- 
methods. 

> shows the way DPs and CPs connect together, 
ation takes place in terms of fixed-size, 256-bit 
infomiation. Each packet contains an 11-bit desti- 
Iress field. When a source DP sends a data packet 
ation DP, the packet first moves to the local CP via 
P-to-CP connections to the CP on the destination 
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node. Finally it transfers to the destination DP. Each CP uses 
the destination address, part of the packet, to determine the 
path through the network. The connection between the DP 
and the CP is identical to the connection between two neigh¬ 
boring CPs, except that it is implemented as a 32-bit parallel 
path instead of a serial connection. The CP meets the follow¬ 
ing requirements: 

• Independent operation. In particular the data processor 
need not be involved in the forwarding of packets not 
yet arrived at their destination. 

• Absence of deadlock and starvation. The arrival of 
packets at their destination is guaranteed (provided the 
destination DP does not stop consuming incoming 
packets). This means that the CP avoids cyclic wait-for 
relations between packets. 

• Dynamic and static routing modes. In the dynamic 
routing mode packets transfer to a destination via differ¬ 
ent routes. The main purpose of this dynamic routing is 
to balance the packet load of the communication net¬ 
work and to optimize use of the large total bandwidth. 
The disadvantage of dynamic routing is that the order in 
which a source sends packets to a particular destination 
and the order in which the destination receives them are 
not guaranteed to be the same. Use of the communica¬ 
tion processor in the static routing mode guarantees or¬ 
der preservation of the stream of packets between source 
and destination communication processors. 

• Independence of network topology 20 or network size. 

• Implementable in one VLSI chip. 

• Efficient usage and administration of packet storage 
space. 

• High data throughput. 



Figure 5. A small instance of the Pooma architecture, 
consisting of five data processors connected by five 
communication processors. 
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Figure 6. Front view of the Pooma prototype. 


The routing function in the CP uses a routing table that is 
downloaded into the CP upon initialization of the Pooma 
hardware. When a packet arrives in a CP, the CP checks the 
routing table with the destination address of the packet to 
find the routing vector for the packet. This routing vector is a 
vector of bits, one bit per CP-to-CP output of the CP. Each bit 
indicates whether or not the packet may be forwarded via 
the corresponding output. When multiple bits are true, dif¬ 
ferent paths may be taken. When all bits are false, the packet 
has reached its destination and must be forwarded on via the 
CP-to-DP output. 

A queue of packets accompanies each output of a CP. 
When a packet arrives in the CP, it resides internally and 
joins the queues associated with the outputs, which are indi¬ 
cated by the bits in the routing vector of the packet. When an 
output wants to transmit a packet, it takes the first packet 
from the queue associated with that output, removes the 
packet from all queues, and transmits. 

The concepts of a routing table and multiple queues to¬ 
gether allow the required independence of topology and size 
of the network and the required dynamic routing. The inter¬ 
nal packet storage of a CP contains 255 buffers in each of 
which one packet can be stored. To meet the requirement of 
absence of deadlock and starvation, we introduced a new 
strategy called class climbing, which Annot 21 describes in detail. 
The nodes of the current Pooma prototype contain a bread¬ 
board-version of the CP. Currently, we are developing a VLSI 
version of the CP. 


The 100-node prototype. We constructed the Pooma 
prototype in a very modular way using standard construction 
materials (see Figure 6). Seven cabinets house the whole 
machine. The machine contains 50 disks, which occupy two 
cabinets on the outer sides. Each disk contains its own SCSI 
interface for connection to the processor board of a node. 
This disk organization makes the Pooma prototype extremely 
suitable for data-intensive applications like databases and 
document retrieval systems. It has a large background memory 
capacity (total 15 Gbytes), and a high bandwidth-to-back- 
ground memory (total 50 Mbytes/s). Due to its decentralized 
nature, a lot of redundancy allows for fault-tolerant designs. 

Four cabinets house the 100 nodes. One cabinet contains 
five crates and each crate contains five nodes. The hardware 
of a node is built around the VMEbus, as seen in Figure 7. 
The Motorola 68020 CPU and 68881 floating-point unit 
implement the DP, while the 68851 functions as the memory 
management unit. Each node contains 16 Mbytes of memory. 

The breadboard CP supports four bidirectional links, each 
of which consist of two unidirectional links running at 20 
Mbits/s. The processor, Ethernet, memory boards, and CP 
connect to the VMEbus. The boards are commercially avail¬ 
able; the CP is a custom design. Note that one crate houses 
five separate VMEbuses, one for each node. Only one node 
in each crate has an Ethernet board used for downloading 
code and inputting and outputting to the host computer. 

The total main memory capacity of the Pooma prototype is 
1.6 Gbytes. This large main memory encourages the use of 
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special techniques (main memory database, inverted file 
method) to improve the performance of data-intensive 
applications. 

The four node cabinets are grouped around the central 
switch cabinet. We added this switch unit to the prototype for 
experimental (and flexibility) reasons only; it is not a funda¬ 
mental part of the Pooma decentralized memory architecture. 
The switch splits the 100-node machine into a number of 
smaller machines (arbitrary multiples of five nodes), so that 
more users can access the machine at the same time. This 
switch also allows experiments with different kinds of network 
topologies. 20 Another feature of the switch is that a faulty 
node or link can be repaired (or replaced) while the rest of 
the prototype remains operational. We simply configure the 
switch such that the faulty parts are not allocated to users of 
the machine. 

The interface with the outside world occurs through one 
Ethernet cable to which the 20 Ethernet boards of Pooma 
connect. These boards run standard Ethernet software (TCP/' 
IP). We’ve designed a software environment that allows users 
on the LAN to use the prototype from their own workstations. 
The software environment also controls the resources (nodes 
and switch) of the prototype and allocates clusters of nodes 
to users on request. Users specify the number of nodes and 
the kind of network topology. A request waits in a queue 
when the number of requested nodes is higher than the num¬ 
ber of free nodes in the prototype. Every time the software 
environment allocates nodes and frees nodes, it automati¬ 
cally resets these nodes so the next user has a clean machine. 

Performance issues. Van Beek, van Twist, and Vlot 12 
describe experiments that show the Pooma communication 
network supports excellent node-to-node message passing, 
even in situations with very aggressive communication loads. 
Experiments included those on: 

• Communication load. Included are the number of 
packets produced per second by each node in the sys¬ 
tem; burst versus no-burst traffic; and uniform versus 
hot-spot packet distribution. Packets within one burst 
transfer to the same destination and form a message 
together. These packets transfer into the CP of the send¬ 
ing DP at a very high speed. In the no-burst situation all 
messages contain just one packet. Uniform packet dis¬ 
tribution means that any network node can become the 
destination of a packet with equal probability. Hot-spot 
packet distribution means that one node has a higher 
probability of being chosen as the destination of a packet 
than the other nodes. 

• Routing strategy. Dynamic or static strategies exist. 

• Network topologies. Torus, chordal ring, and extended 
chordal ring topologies are possible. 12 

The experiments measured packet delay and packet 


throughput, both measures of network performance. We 
concluded the following from the experiments: 

• Throughput. In high-network-load situations of up to 80 
percent link occupation, the network does not incur much 
extra delay when compared to low traffic situations. An 
exception occurs in burst traffic situations when few al¬ 
ternative paths exist. 

• Dynamic routing. In all cases dynamic routing dimin¬ 
ishes network delay, and in the case of burst traffic it 
significantly increases the maximum throughput. 

• Topology. When using static routing or when expecting 
little burst traffic, a topology with a low-average inter¬ 
node distance is most suitable. In burst traffic situations 
the use of more alternative shortest paths more than 
compensates for the increase in average distance. 

• Hot spots. The network behaves extremely well under 
hot-spot loads. Until about 80 percent occupation on 
the highest loaded links, the average delay of a packet 
hardly increases when compared with the delay in low- 
load situations. 

• Scalability. As long as the average link occupation re¬ 
mains below 80 percent of its maximum, only a small 
increase occurs in the delay as the nodes increase. There 
seems to be no reason that this network concept cannot 
be extended to several hundreds of nodes. 

• Comparison. We found it difficult to compare the Pooma 
network with other machines for two reasons. The prin¬ 
ciples on which the machines are based differ, and no 
measurements exist for these machines to compare the 
situation for which the Pooma network was constructed— 
high throughput for many small messages. Compared to 
the iPSC/2, 22 - 23 the POOMA network is better suited for 
routing many smaller messages and for quickly varying 
communication patterns. The iPSC/2 is better suited for 
transferring large messages and for more static commu¬ 
nication patterns. 

System software 

The system software executing POOL programs on Pooma 
hardware consists of a compiler and a dedicated operating 
system. The compiler checks the POOL code and generates 
an executable program. The operating system offers an ab¬ 
straction of the hardware toward the generated code and 
also takes over most distributed POOL tasks from the com¬ 
piler. Furthermore, the operating system handles download¬ 
ing and execution of the program and offers some facilities 
for performance analysis and debugging. 

The compiler. The compilation of POOL programs con¬ 
sists of several steps. First the compiler checks the POOL 
code against syntactic and semantic errors. Next the compiler 
uses an intermediate language to compile the program into 
assembly code. The intermediate language describes a stack 
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vet objects. These objects do not run in parallel with 
objects that send messages to them. They simply 
lor a message, execute the corresponding method, 
the result to the sender, and then wait for the next 
ge. In other words a server object does not need a 
•dged process for itself. Instead, messages sent to 
by an object on the same node can be processed 
‘serve-yourself’ mechanism: The sender executes 
ethod code within its own process. In this way, we 
■eplace the expensive message-passing mechanism 
an inexpensive procedure call. We only need a 
3hore to prevent several senders from executing 
•ds in the same object simultaneously. (For pro¬ 
messages sent from other nodes, special system 
sses execute the serve-yourself mechanism.) 
objects. These objects obey the same restrictions as 
objects. In addition their methods are atomic, that 
ly can execute from beginning to end without 
.ing scheduling in between. Data objects do not need 
emaphore that server objects use to control the ac- 
their methods. In many cases, the procedure call 
n replaced by in-line code. An example of a data 
is a recordlike object with only put and get meth- 
accessing the variables. Several built-in classes 
as arrays are also of this type. 
objects. In addition to satisfying the restrictions of 
objects, value objects guarantee that their internal 
never change after their creation. They are immu- 
Programmers can replicate these objects without 
ing the semantics of a program. Whenever a refer- 
to a value object is included as a parameter in an 
■ ode message, the runtime system copies the corn- 
object. In this way, access to value objects is al- 
guaranteed to be local and therefore inexpensive, 
built-in classes are of this kind, for example, inte- 
strings, tuples. 


to 


for ; 


Only fo • the remaining kind of objects (we call them pro¬ 
cess objects) does the compiler generate full-fledged, but 
lightweigf t, processes that execute the objects’ bodies. Since 
server, da a, and value objects do not each need a process, 
many more objects than processes can exist on a node. 


Since the performance gained by using specialized object 
implementations can be very large (see later), we decided 
that the compiler should not perfomi these optimizations 
unless the programmer is aware of them. Instead, the pro¬ 
grammer indicates the intended object type by a pragma , a 
kind of comment to the compiler. The compiler then checks 
for confirmation that the restrictions are satisfied and only 
then performs the optimization. 

The operating system. We specially constructed the op¬ 
erating system for execution of POOL programs. It roughly 
consists of three parts: 

• The nucleus , which takes care of the resources offered 
by the Pooma hardware, that is, the memory, CPU, CP, 
and optional disk and Ethernet. 

• The POOL support , which handles most of the distrib¬ 
uted tasks related to POOL, most notably message han¬ 
dling, garbage collection, and object allocation. 

• The program support , providing among other facilities 
downloading programs, a monitor on which to graphi¬ 
cally display the execution, and profiling and debug¬ 
ging tools. 

Because of the high rate of allocations and deallocations, 
we made the functionality of the memory manager as simple 
as possible. It only supports simple alloc and free primitives. 
Disk swapping is eliminated: Main memory stores all data. 

We paid special attention to scheduling, because it occurs 
very frequently. We chose to support only lightweight pro¬ 
cesses. Furthermore, scheduling occurs only on request. As a 
consequence the compiler schedules as needed and tries to 
minimize the amount of data to be saved at those points. To 
prevent monopolization of the CPU, the compiler inserts some 
preemptive schedule points in loops, for example. 

The operating system directly supports the basic POOL 
communication primitives. The message handler is distrib¬ 
uted over the nucleus and the POOL support. The message 
handler of the nucleus provides a transport layer and sup¬ 
ports three different message types. It supports a fixed-sized 
message for executing routines with a small number of pa¬ 
rameters on a remote node; a copy message for copying data 
of any size to a predetermined location at the receiving node; 
and a variable-sized message. In the latter case, the receiving 
node in which the data is stored must allocate memory. The 
POOL support layer of the message handler offers two differ¬ 
ent buffering protocols, destination and global message. 24 
Destination buffering assumes the message can be allocated 
at the receiver; it crashes the system when this is not the 
case. By contrast, global message buffering inserts the mes¬ 
sage in a distributed queue in this case and fetches the mes¬ 
sage later, when sufficient memory is available. 

The programmer, considered primarily responsible for 
proper allocation of the objects, specifies allocation pragmas 
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Table 4. Timings for get access. 


Object 

type 

With runtime 
checks (|is) 

Without runtime 
checks (jis) 

Process 

426.0 

425.0 

Server 

2.8 

2.2 

Data 

2.8 

1.3 

Value 

2.0 

1.3 


Table 5. Timings for method access. 

Object 

With runtime 

Without runtime 

type 

checks ((is) 

checks ((is) 

Process 

426.0 

425.0 

Server 

7.0 

6.5 

Data 

7.0 

5.4 

Value 

6.2 

5.4 


Table 6. Timings for memory management. 


Operation 

Small ((is) 

Large (ps) 

Nucleus alloc 

20 

67 

Nucleus free 

15 

67 

POOL alloc 

25-30 

77 

POOL free 

19 

71 


Table 7. Timings for process management. 

Operation 

Time (jis) 

Switch 

9 

Nucleus switch 

19 

POOL switch 

12 

Nucleus insert into ready queue 

14 

POOL insert into ready queue 

29-32 


in the POOL code. With such a pragma the programmer can 
specify a set of nodes on which a new object can be allo¬ 
cated. The operating system then attempts to allocate the 
object on one of these nodes. 

A garbage collector removes redundant POOL objects (more 
specifically, objects that will no longer participate in the ex¬ 
ecution of the program). For this the operating system uses a 
distributed, on-the-fly, mark-and-sweep garbage collector. 25 
The algorithm runs concun'ently with the POOL program (on 
the fly) to exploit the available parallelism optimally. We used 
a mark-and-sweep strategy instead of reference counting 
because cyclic reference structures are very common in POOL. 

Performance. Currently we are tuning the POOL imple¬ 
mentation; therefore we have only a limited number of per¬ 
formance figures as yet. 

Tables 4 and 5 display the effects of introducing server, 
data, and value objects. Table 4 lists the amount of time needed 
to call a method that simply extracts the value of an integer 
variable in a local object (a get operation). POOL provides a 
special notation for such a method, thus giving the compiler 
the opportunity to optimize the accesses with in-line code. 
When runtime checks are disabled, the compiler does not 
check to verify whether the object has been assigned a proper 
value nor perform deadlock checks. The table shows that 
local process communication should certainly be optimized 
further (we’ve paid little attention to that up to now). It also 
shows that for other object types the performance is very 
good (a comparable operation in a C program takes 1.0 (is). 


Table 5 lists results of a similar operation that enforces a full 
method call. Such an operation compares to a C routine call, 
which takes 4.7 (is. 

From these tables we can conclude that the POOL imple¬ 
mentation can almost keep up with the C implementation, if 
the programmer makes use of the proper object types. We 
should also take into account that the POOL compiler does 
not use very advanced optimization techniques (such as 
interprocedural dataflow analysis), so some room for im¬ 
provement still exists. 

The memory manager employs the paging mechanism of 
the MMU provided by the DP board. Therefore a difference 
must be made between allocation requests smaller than the 
page size and larger than the page size. The timings of the 
memory manager depend on its usage. To evaluate the 
memory manager, we ran a typical POOL program (a paral¬ 
lel theorem prover) and measured the average allocation time. 
The results appear in Table 6. The first two operations in¬ 
volve timings at the nucleus level of the operating system, 
the latter two are measured at the compiler level. The POOL 
allocation time depends on the context in which it is called. 
These figures indicated that the memory manager is sufficiently 
fast to cope with most POOL applications. 

We also measured the time it took for the process manager 
to switch among processes and insert processes into the ready 
queue. Table 7 lists the results. The bare switch does not 
save any context beyond the program counter. The other 
switch times reflect speeds for nucleus-level processes and 
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:esses. A similar distinction is made for insertion in 
in which the POOL timing depends on the con- 
the operation is performed. We can conclude that 
manager is sufficiently fast even for programs 
ively small grain size. 


one of the most important leaps forward made 
»Tia project is the easy access for application pro¬ 
to the full power of a parallel computer system, 
er parallel computer projects concentrated on 
are aspects, resulting in systems that were fast 
program. The Pooma project radically broke 
tradition. A main goal was to build a system on 
mplex applications could be programmed easily 
iciently. Therefore we did not choose the standard 
is for parallel computer systems, which are easy 
on all systems. Instead, we concentrated especially 
problems that are difficult to map to traditional 
2 chines—symbolic applications. These applications 
have irregular computation patterns, which often 
the input data. Since we did not (and still do 
e automatic techniques for extracting parallelism 
ams are sophisticated enough to achieve an ef- 
omparable to programs with explicit parallelism, 
the latter alternative. 

d one of the most demanding applications, as 
grammability is concerned, to be the Horn Clause 
over, 26 which can also be used to execute logic 
similar to Prolog. This theorem prover makes 
eadth-first strategy and uses Or-parallelism. The 
d>vides a mechanism to prohibit the repeated ex- 
of the same goal. Further, the prover makes use 
examples to prune the search tree. In this ap- 
new goals derive from old ones and axioms all 
the connection structures between these goals 
to changes, and many goals become obsolete 
time. For such an application the POOL features 
g new objects dynamically and changing their 
iction patterns seem indispensable. Furthermore, 
e collector takes care of clearing away the obsolete 


is 


of processes manipulate the goals in the Horn 
over and are transferred from one to the other, 
transparency of the process locations and the 
very important. Objects or references to them 
nt from one process to the other without the 
n ng knowledge of the underlying network structure, 
aowledge of the location of the receiver or the 
.rated object. 

metical work on parallel programs gave rise to a 
c erstanding of the incorporation of parallelism in 
s. We observed that processes of different types 


typically occur only at the highest design levels. These 
processes correspond to different tasks in the application. 
They do not give rise to much parallelism but are impor¬ 
tant for the modular structure of the program. In general 
the communication patterns at this level are irregular. Most 
designers usually introduce parallelism at the lower design 
levels, in which several identical processes occur that perform 
the same task on different data items. At this level we recognize 
regular communication structures and simple communication 
protocols. We used certain basic communication structures 
in a number of applications: 

• Pipelines. We used pipelines when a number of trans¬ 
formations had to be applied to a large set of similar 
data items. 

• Trees. We typically used these structures for problems 
that can be solved by divide-and-conquer techniques. 

• Blackboards. In blackboards many processes communi¬ 
cate via a shared set of data. They take data items from 
the set, process them, and enter new items to it. 

• Hypercubes. These structures can serve to broadcast 
messages to or to collect data from a number of processes. 

• Shuffle exchanges. The most important use here is in 
sorting and in processing fast Fourier transforms. 

As far as the efficiency of programs is concerned, we 
found that discovering sources for parallelism was seldom 
a problem. The difficulty mostly was to exploit this parallelism 
in a sensible way. The main causes for inefficiency are 
processor idle time and overhead introduced by commu¬ 
nication. Often the reduction of either of those causes 
results in an increase of the other, making it essential to 
find a good balance. Bosco, Cecchi, and Moiso 11 offer the 
best example of this relationship. Here, a dynamic load¬ 
balancing strategy, introducing as little communication as 
possible, decreases idle time. All aspects of efficiency are 
very sensitive to which processes are grouped on the same 
node. Therefore the programmer’s ability to influence the 
allocation of objects to nodes by adding pragmas to the 
program is very important. 


ESPRIT PROJECT 415 COVERED THE INVESTIGATION 
of six different language and execution models for paral¬ 
lel symbolic computer systems and design of corresponding 
languages and system implementations. The project resulted 
in a major contribution to the theoretical and practical 
understanding of this novel type of system and to the 
emergence of a scientific community in this area in Europe. 

The single-project framework provided a forum for dis¬ 
cussions and comparisons among the different ways to achieve 
parallel symbolic systems. In addition, the efforts on the 
object-oriented style delivered a scalable, highly parallel 
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system with a novel, high-performance communication ar¬ 
chitecture. The parallel object-oriented language provides 
programmers with a very flexible and dynamic model of 
parallelism. It permits concentration on the essential as¬ 
pects of a parallel design, while abstracting from many 
details of the underlying system. Despite the dynamic model, 
the implementation of the language results in excellent 
performance. (P 
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continued from p. 19 

supporting windowing systems enjoy increasing popularity. 
In both cases, the workstation formerly carried out the corre¬ 
sponding processing. Similarly, one expects techniques for 
high-performance systems developed on machines such 
as Supernode to migrate into next-generation embedded 
systems. 

Transputer architecture 

The development of transputer architecture involved four 
main objectives: 

• it created a commercial product range that set new 
standards in ease of programming and engineering; 

• it provided maximum performance to the user; 

• it allowed the exploitation of future developments in 
VLSI technology within a compatible family; and 

• it created a programmable component for building 
systems with large numbers of concurrent computing 
components. 

A transputer contains a processor, memory, and a num¬ 
ber of standard point-to-point communications links—all 
integrated into one silicon chip (as shown in Figure 1). An 
external memory interface extends the on-chip memory. When 
appropriate, transputers also incorporate special-purpose 
processing and/or interfacing capabilities. Separating the 
external memory interface (for local memory) from the 
communications optimizes performance and minimizes 
contention. 



Figure 1. Processing and interfaces in transputer 
architecture. 


A system is constructed from one or more transputers 
operating concurrently and communicating through stan¬ 
dard links. The programming language Occam (see box) 
formalizes the computational model. Occam describes a 
system as a collection of processes and communications 
that operate concurrently and communicate through channels. 

Transputer processing 

The transputer directly implements the Occam model of 
concurrency. A hardware scheduler allows any number of 
Occam processes to share a single processor, and transputer 
instructions implement Occam message passing. An appli¬ 
cation designer can configure a collection of processes ready 
for execution on a network of transputers. Each transputer 
executes a component process and transputer links imple¬ 
ment Occam channels. 

Both internal and external communications use the same 
instructions, allowing for Occam program reconfiguration 
(such as using a different processes-to-processors alloca¬ 
tion) without recompilation. In particular, an application 
designer can configure an Occam program to execute on a 
small number of transputers for low cost or on a larger 
number of transputers for high performance. 

The transputer processor supports fast interrupt response 
by providing two levels of priority. (Typically, the interrupt 
response is less than 1 microsecond on a 20-MHz clock 
transputer; worst case, it is less than 4 microseconds). Using 
the ALT construct (a key word in Occam meaning alterna¬ 
tive), a high priority process waits for the first of several 
inputs to become ready. It then executes the specific piece 
of code to respond to the particular interrupt. 

The processor treats access to a timer as an input. In a 
delayed input, the process waits until the timer reaches an 
appropriate value. The processor supports an arbitrary number 
of timer inputs. A programmer can also use a timer input 
within an ALT construct as a time-out on a communication. 

Developing the transputer instruction set involved five 
design objectives: 

• to implement Occam effectively, so that high-level lan¬ 
guage usage results in the effective use of silicon capa¬ 
bility, and that highly concurrent programs execute with 
minimum overheads; 

• to implement Occam simply and directly to faciliate easy, 
straightforward program compilation, and to ensure that 
lower level programming is unnecessary; 

• to provide word-length independence, so that a pro¬ 
gram executes using processors of different word lengths 
without recompilation; 

• to provide position independence, so that programs and 
workspaces are allocated anywhere in memory after 
recompilation; and 

continued on p. 78 
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Occam language 3 programs concurrent, distributed 
. The word distributed emphasizes the unsuitabil- 
revious languages in this area. Occam describes a 
as a collection of concurrent processes that com- 
te with each other and with peripheral devices 
channels. Concurrent processes do not communi- 
shared variables; thus Occam is particularly suit¬ 
programming systems with no memory sharing 
n processors. 

provides three primitive processes: 


:j.m 


v := e Assign expression e to variable v 
c ! e Output expression e to channel c 
c ? v Input from channel c to variable v 

Occim provides constructs that combine primitive 
processes: 


The 
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Basic Occam 


SEQ 

Components execute one after 


another (sequential) 

PAR 

Components execute together 


(parallel) 

ALT 

First ready component executes 


(alternative) 


language also provides IF and WHILE constructs, 
nstruct is itself a process and it may be used as 
ponent of another construct. Occam syntax uses 
tion to indicate program structure. 

•ogrammer writes parallel programs by using 
Is, inputs, and outputs combined in parallel and 
tive constructs. Each Occam channel provides a 
inication path between two processes. Commu- 
n is synchronized and takes place when both 
putting and the outputting processes are ready, 
be communicated is copied from the outputting 
s to the inputting process, and both processes 
je. 

ALT process waits for input from any one of a 
r of channels. ALT takes input from the first to 
d for output by another process. 


Occam provides a replicated constructor. For example 

SEQ i = base FOR count 
a[i] := i 


concepts 

implements as a loop and is equivalent to 
SEQ 

a[base] := base 

a[base + 1] := base + 1 


a[base + count - 1] := base + count - 1 

Replication used with PAR provides arrays of similar 
processes. ALT and IF can also be replicated. 

Using a construct as a component in another con¬ 
struct makes it possible to design a system as a set of 
nested processes (for example, by using PAR within 
PAR as shown in Figure C). The messages input and 
output on the channels of a process fully specify the 
process, and this completely hides its internal structure 
from the outside world. 

Internally, the programmer can structure the process 
itself as a set of nested processes. At any level of design, 
the designer works with a small and manageable set of 
processes. 




Figure C. Design of nested Occam language processes. 
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• to provide low-latency response to communications with 
external devices. 

The resulting design 4 uses a simple linear address space, 
six functional registers for sequential programming (see Fig¬ 
ure 2), and additional registers as queue pointers to support 
concurrency. 



Figure 2. Function of transputer registers for sequential 
programming. 

The six registers for sequential programming are the 

• workspace pointer that points to a storage area contain¬ 
ing local variables, 

• instruction pointer that points to the next instruction to 
be executed, 

• operand register used to form instruction operands, and 

• A, B, and C registers that form an evaluation stack. This 
stack holds the operands and intermediate results for 
expression evaluation. 

The hardware scheduler allows for the combined execu¬ 
tion of any number of processes through the sharing of pro¬ 
cessor time. At any time, a concurrent process is active (either 
currently executing or on a list awaiting execution) or inac¬ 
tive (either ready to input, ready to output, or waiting until a 
specified time). 

A list holds active processes awaiting execution. This linked 
list of process workspaces uses two registers in implementa¬ 
tion—one that points to the first process on the list and one 
that points to the last process. The hardware scheduler main¬ 
tains two such lists—one for high-priority processes, the other 
for low-priority processes. 

The implementation of the instruction set uses a single 
level of microcode. Many instructions execute in one cycle 
(50 ns on a 20-MHz transputer); many of the rest execute in 
two cycles. Some complex functions (such as block move) 
take an arbitrary number of cycles. These instructions still 
provide higher performance than possible with software. 

To limit the latency figure for switching between low and 
high priority, time-consuming instructions allow a switch 
during execution. Consequently, the processor never takes 


more than 4 microseconds to switch between low priority 
and high priority. 

A context switch between processes executing at low pri¬ 
ority occurs only when the evaluation stack contains no use¬ 
ful contents. With minimal need to save and restore registers, 
the processor implements concurrency very efficiently. 

The instruction format uses very compact encoding based 
on 1-byte instructions. Prefixing instructions are used to form 
long operands. The instruction size is independent of the 
word length. In general, a program requires much less stor¬ 
age to hold it than an equivalent program in a conventional 
or RISC microprocessor. Since a program requires less stor¬ 
age to represent it, fetching instructions use less memory 
bandwidth. As the transputer accesses memory one word at 
a time, the processor receives several instructions for every 
fetch (depending upon the number of bytes in a word). 

In addition to Occam, high-performance compilers for C, 
Fortran, Pascal, and Ada have been implemented for the 
transputer. 

Transputer communications 

A link between two transputers implements a pair of Occam 
channels, one in each direction. Two one-directional signal 
lines connect a link interface on one transputer to a link 
interface on the other transputer. Each signal line carries data 
and control information. 

Communication through a link involves a simple protocol, 
which supports the synchronized communication of Occam. 
The protocol provides for the transmission of an arbitrary 
sequence of bytes, which allows transputers of different word 
lengths to communicate. 

Each byte transmits as a start bit, followed by a one bit, 8 
data bits, and a stop bit (see Figure 3a). After transmitting a 
data byte, the sender waits until receiving an acknowledgment, 
which consists of a start bit followed by a zero bit (see Figure 
3b). The acknowledgment signifies both that a process re¬ 
ceived the acknowledged byte, and that the receiving link is 
ready to receive another byte. The sending process proceeds 
only after receiving acknowledgment for the final byte. 

Data bytes and acknowledgments multiplex down each 
signal line. An acknowledgment transmits as soon as recep- 
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Figure 3. Formats of data links (a) and acknowledgment (b). 
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of point-to-point serial communications, instead 
offers the following advantages: 


simp lifted board layout and backplane design; 
increased communications bandwidth, as many links in 
a syi tern operate concurrently; and 
easy interconnection of devices with different word 
lenghs and performance. 


ters with different word lengths and performance 
ork together, ensuring the easy upgrading of sys- 
the technology advances. It is not necessary to 
.e a connected set of components to the perfor- 
the slowest component! 

nsputer contains a separate communications en- 
wing communications to proceed in parallel with 
tion of processor instructions. Indeed, many appli- 
ompletely overlap communications and processing, 
ig overall system throughput, 
ik adapters and a link switch add to the flexibility 
of communications. The link adapters provide an 
between a link and a byte-wide port. The Inmos 
switch provides a crossbar switch between 32 links, 
1 by a separate configuration link. The C004 is 
e, allowing for the construction of arbitrary networks 
iters (limited only by the number of links on each 
most transputers contain four links). 


mming paradigms 

designing transputer-based hardware to perform at 
ed level of performance is easy, one must ensure 
oftware configuration exploits the hardware archi- 
Software structured as a sequential program with 
onal compilation operates no faster on 10 transputers 
(pne! 

ibedded systems, the design of software architec- 
hardware architecture optimally occurs hand-in-hand, 


\ 


resulting in essentially a system design activity. 5 

To configure a program for a network of transputers, the 
applications designer identifies the parallelism. The designer 
subsequently maps the parallel processes onto the transputer 
network to optimize the system according to the design crite¬ 
ria (such as maximized performance and minimized latency). 

Experience gained to date with parallel systems—particu¬ 
larly with ESPRIT projects—identified a number of program¬ 
ming paradigms that help to structure systems designs. 6 These 
paradigms can be written in languages such as Ada, 7 or can 
employ parallel extensions or libraries in C or Fortran. How¬ 
ever, Occam 3 describes these paradigms most conveniently. 

Descriptions of the paradigms for algorithmic parallelism, 
geometric parallelism, and farming appear below: 

1) Algorithmic parallelism. The designer splits the applica¬ 
tion into functional units. In simple cases, these units 
can form a pipeline; in general, they can form more 
complex structures such as feedback loops. The various 
stages potentially operate in parallel. With a modest 
amount of buffering, the communication of data or par¬ 
tially computed results between stages will, in many cases, 
completely overlap processing (eliminating communi¬ 
cations overhead). The structure maps onto one or more 
transputers, up to the number of components in the 
structure, with the communications performed internally 
or via transputer links, as appropriate. For a pipe¬ 
line structure, its slowest stage limits the maximum 
performance. 

2) Geometric parallelism. Designers partition the design on 
a regular basis. For example, they can divide a screen 
image into quadrants, or a matrix operation into 
submatrices. A separate process performs the computa¬ 
tion for each partition. The processes often operate in¬ 
dependently or interact only with immediate neighbors. 
A particularly beneficial use of geometric parallelism in¬ 
volves scaling up the size of a problem to be solved 
(such as performing weather forecasting on a finer mesh). 

Designers usually implement geometric parallelism by 
allocating processes straightforwardly onto a physical 
regular network. Geometric parallelism usually attains 
maximum performance by considering granularity issues. 
The processes communicate very efficiently with their 
immediate neighbors when executing on the same pro¬ 
cessor. The processes communicate less efficiently with 
immediate neighbors when executing on separate pro¬ 
cessors. Where one process is allocated to each proces¬ 
sor (see Figure 4a on the next page), communication 
costs dominate performance in most instances. With all 
processes on a single processor, computation time 
dominates performance. 

Computation and communication achieve balance at 
an intermediate level of granularity (see Figure 4b). This 
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(a) (b) 


□ Processor O Process — Occam channel 


Figure 4. A simple allocation of geometric parallelism 
granularity (a); a balanced allocation of geometric 
parallelism (b). 

balance results from a boundary-to-area effect (for two- 
dimensional grids) in which the amount of communica¬ 
tion at the boundaiy varies linearly with the grain size, 
and the area of computation varies as the square of the 
grain size. In comparing the two figures, the boundary- 
to-area effect in Figure 4b results in four times the amount 
of computation, but only twice the communication as 
found in Figure 4a. A similar surface-to-volume effect 
occurs for three-dimensional grids. 

3) Farming. Designers divide the application into small, 
similar pieces (for example, when needed to process a 
large number of similar data items). Each transputer in a 
network provides a server process, and a master pro¬ 
cess allocates (or “farms out”) work to the servers as 
they become free. Farming provides two major benefits. 
First, it automatically balances the load—a server com¬ 
pleting one piece of work immediately proceeds to the 
next. Second, it functions relatively independent of the 
topology—any reasonably linked network works well. 
The limit to performance is the rate of dispensing work 
and handling results. 

The Occam model of parallelism offers a significant ben¬ 
efit: Messages sent and received completely define a process. 
The designer can structure a process internally as a set of 
processes, thereby using any desired level of nesting. A very 
powerful technique, therefore, combines the above paradigms 
in one application, such as a farm of geometric-array servers 
functioning with pipeline components. (The earlier Occam 
box diagrams the Occam program that encapsulates all three 
paradigms within a simple top-level structure.) 

The designer can use these models to easily structure ap¬ 
plications to define a large amount of parallelism. The nar¬ 


row and easily defined interface specifications allow for easy 
reasoning about an application’s correctness. 

The next-generation transputer 

For the past two years, a team at Inmos’s Bristol, England, 
design center has been working to enhance the transputer’s 
performance and suitability for embedded systems. From this 
work, a new product family—based around a new processor 
code-named HI—will be launched in Spring 1991. 8 

The team’s design goal called for establishing a new stan¬ 
dard in single-processor performance while enhancing the 
transputer family’s position as the premier multiprocessing 
microprocessor. It also required maintaining upward com¬ 
patibility with existing transputer products. 

To meet these goals, the team developed a new micro¬ 
architecture that implements the same instruction set as the 
existing Inmos T805 transputer. The HI provides an order- 
of-magnitude increase in performance, combined with en¬ 
hanced capabilities to support the software standards emerging 
in the embedded systems marketplace. 

The HI architecture includes such key features as a 
pipelined, superscalar processor combined with on-chip cache 
RAM, and improved communications that provide a new 
degree of freedom in multiprocessor programming. 

To complement the HI transputer, Inmos is now design¬ 
ing a range of network communication products based on a 
new 100-Mbit/s link protocol. The protocol supports the dy¬ 
namic routing of messages between processors. 

HI performance 

The HI provides a peak performance in excess of 130 
MIPS (million instructions per second) and 20 Mflops (mil¬ 
lion floating-point operations per second) and a sustained 
performance exceeding 60 MIPS and 10 Mflops. It maintains 
instruction-set compatibility with the T805. 

A number of design features contribute to the achieve¬ 
ment of these performance levels. The processor itself uses a 
pipelined, superscalar architecture, which executes up to eight 
instructions on each clock cycle and operates at a clock speed 
of 50 MHz. The number of cycles required to execute many 
instructions—such as integer and floating-point multiply, and 
logical shift—decreases significantly. 

Unlike other superscalar machines, the HI architecture does 
not require an advanced compiler to schedule the different 
functional units in the processor. Hardware controls the flow 
of multiple instructions through the pipeline. It is not neces¬ 
sary to modify existing compilers or recompile source code. 

An advanced submicron, CMOS (complementary metal- 
oxide semiconductor) process allows a high transistor count 
and high clock frequency operations. This process enables 
the implementation of a sophisticated processor and pro¬ 
vides 16 Kbytes of on-chip cache memory. 

The move to a cached architecture is a radical develop¬ 
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ment. The 16-Kbyte cache is sufficiently large to achieve high 
hit rates for most applications. The HI still allows for directly 
addressed, on-chip RAM for applications containing only small 
amounts of memory, or ones intolerant of indeterminate 
performance caused by cache-line misses. 

The design team took great care to ensure that the HI 
transputer will provide high performance levels in low-com¬ 
ponent-count systems. For example, the HI will provide a 
programmable memory interface with a 64-bit data bus sus¬ 
taining high data transfer rates for cache-line refill. The inter¬ 
face supports four independent banks of external memory, 
and the timing for each bank is configured independently 
from software. For example, an application designer can 
choose to fill two banks with dynamic RAM—one bank with 
virtual RAM, the other with peripherals. Such a system fre¬ 
quently requires no external support logic. 

HI error-handling and user-mode processes 

The HI transputer hardware supports the same scheduling 
algorithms used by current-generation transputers, such as 
the T805. In addition, on the HI transputer each process may 
use a second process (known as a trap handler). When an 
error—such as an integer overflow or a floating-point er¬ 
ror—occurs, control transfers to the trap handler. The trap 
handler copes with the error in software in all cases before it 
(in most instances) returns control to the process in which 
the error occurred. 

The HI also supports a separate user mode. This mode 
prevents privileged instructions (including communications 
and scheduling instructions) from executing. It checks and 
translates all memory accesses from a logical to the physical 
address space. 

HI transputer enhancements 
allow programmers to write 
more efficient real-time kernels. 

Memory protection and address-translation mechanisms 
specifically support secure programming and debugging in 
embedded systems. For dedicated (single-user) systems, the 
protection aids the detection of programming errors. For 
multiuser, general-purpose computing systems, it protects 
users and the operating system from erroneous programs. 

The protection and translation mechanisms are optimized 
for the requirements of embedded systems. These mecha¬ 
nisms allow the processor to execute code in protected (user) 
mode at the same speed as normal processes without the 
performance overhead involved in supporting page-based, 
virtual memory. 


In contrast, but in keeping with its intended market, addi¬ 
tional HI transputer enhancements allow programmers to 
write more efficient real-time kernels. These enhancements 
access and control the state of the machine, the process and 
timer queues, and time slicing and interruptability mechanisms. 

New freedom 

A limitation in exploiting existing transputer networks is 
the need to match the parallel structure of the algorithms 
used to the interconnectivity provided. In the worst case, a 
specific machine possesses a fixed topology. In the best case 
(a totally reconfigurable architecture such as the Supernode), 
the limitation of four links per transputer restricts mapping. 
This limitation can result in poor software portability, 
nonoptimum design, and scalability problems. 

The HI product family largely eliminates this problem by 
providing hardware to allow transputer connections via a 
low-latency communication network. It supports communi¬ 
cation channels between any two processes anywhere in the 
network. 

The hardware simplifies programming because designers 
do not need to consider how to allocate processes to the 
transputer network until after completion of the program 
writing. They can use different allocations on different ma¬ 
chines and they can change the allocation to optimize per¬ 
formance. In addition, it is possible—at least in principle—to 
let the compiler make this allocation, effectively removing all 
configuration details from the program. Pountain 9 discusses the 
communications capabilities of the HI product family in 
greater detail. The following paragraphs sum up these 
capabilities. 

The HI transputer itself contains a separate communica¬ 
tions processor, which multiplexes a large number of logical 
communication links (virtual links) along each of its physical 
links. Each virtual link supports one Occam channel in each 
direction. 

The communications processor transmits messages as a 
sequence of packets, which all contain 32 bytes of data ex¬ 
cept the last packet. Each message packet starts with a header, 
which routes the packet through the communication net¬ 
work and identifies the destination virtual link on the remote 
transputer. 

Designers construct a separate communications network 
using the Inmos Cl04 routing device and can use one Cl04 
to connect small numbers of HI transputers. In larger systems, 
designers can use Cl04 connections to form a hypercube, a 
multidimensional grid, or a tree network. 

Each Cl04 provides 32 bidirectional links. The header of 
each packet arriving on a link input determines the link on 
which to output the packet. As soon as the link output is 
free, the whole packet transmits through it. 

An algorithm known as interval labeling decides through 
which link to send a packet. In interval labeling, each output 


December 1990 81 
















Transputers 


link is associated with a continuous set of header values (an 
interval). The header of an incoming packet lies within only 
one range, and the packet transmits to the associated link. 
Optimum, deadlock-free labeling schemes exist for each of 
the common network topologies. 

The Cl04 provides additional facilities to connect networks 
together and reduce the impact of message congestion on 
worst-case latency and bandwidth in heavily loaded networks. 

The TRANSPUTER IS WELL ESTABLISHED as a highly cost- 
effective processor, particularly in embedded applications. It 
provides unique advantages for applications that require more 
than one processor, and serves as the basis for many research 
programs in parallel computing. 

Inmos developed the current generation of transputers as 
general-purpose components for special-purpose machines. 
The introduction of a higher performance transputer—sup¬ 
porting virtual communications, memory protection, and other 
advances—represents a significant step toward the develop¬ 
ment of general-purpose, multiprocessing computing systems. 
Research continues on the architecture to effectively exploit 
the full capabilities of VLSI technology during the 1990s. 
Meanwhile, the HI provides highly efficient implementations 
of conventional operating systems and real-time kernels. It 
greatly reduces the cost of porting existing software and up¬ 
grading existing applications to take advantage of the 
transputer’s capabilities. JO 
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ties required to execute those commands, and the Object 
Manager provides the shared object storage. 

The Session Manager component provides the mechanism 
by which an application starts a database session. It creates 
an instance of each of the Request Manager and the Data 
Manager for each database session. 

Figure 5 also shows the main interfaces between the com¬ 
ponents of the system. The first is ESQL, which is used by an 
application to access the database system. Lera is an extended 
relational algebra that is used between the Request Manager 
and the Data Manager. Last is the Process Control Language 
interface provided by the kernel. 

A set of ESQL commands forms the input to the Request 
Manager, which compiles these commands in five stages: 

• Syntax analysis. This stage parses the input and con¬ 
verts it to an internal staicture. It also performs type 
checks. 

• Logical optimization. The logical optimizer reorganizes 
the query by applying transformation rules to the query. 
These transformations perform functions such as predi¬ 
cate migration to minimize the size of intermediate re¬ 
sults, the elimination of common subexpressions to 
remove redundant work, operator transformation to 
combine operators to simplify the task of the physical 
optimizer, application of constraints, and optimization 
of recursive queries. 

• Physical optimization. The physical optimizer deter¬ 
mines the order of the basic operations to minimize in¬ 
termediate results, selects the best access path, chooses 
the algorithms, and determines the optimal degree of 
parallelism in the query. The choice of these options is 
based on the minimization of a cost function. 

• Parallelization. The parallelizer translates the interme¬ 
diate form generated by the physical optimizer into the 
parallel program representing the query. 

• Code generation. This stage performs the final genera¬ 
tion of the object module containing machine code and 
calls to the runtime facilities of the Data Manager. 

As Figure 5 shows, the Request Manager consists of four 
main components: the monitor, analyzer, compiler, and cata¬ 
log manager. The monitor provides the operational interface 
between the application and its instance of the Request 
Manager. The analyzer performs the first stage of the com¬ 
pilation of a query, and the compiler perfonns stages 2 to 5. 

To support the management of the relations in a database 
and the compilation and optimization of queries, the Request 
Manager maintains a catalog, sometimes called a metabase, 
of information about the relations and schema in the data¬ 


base. The catalog manager provides the Request Manager 
with a simple interface for accessing this data. 

The parallel programs generated by the Request Manager 
execute in the runtime environment provided by the Data 
Manager, which consists of four main components: 

• Relational Execution Model. This runtime library in¬ 
cludes relational operations; operations supporting the 
ADT, objects, and Riles; and controls operators. 

• Relation Access Manager. This manager provides a glo¬ 
bal abstraction of the relations in the database. That is, it 
hides the distributed nature of the relations from the 
operations in the Relational Execution Model. The man¬ 
ager also provides the mechanism for calling the appro¬ 
priate access methods for the indexes associated with a 
relation. 

• A set of access methods. These methods provide the 
mechanisms for accessing the tuples of a relation. One 
access method will implement each index associated 
with a relation. The indexes offer fast methods for ac¬ 
cessing the tuples of a relation. 

• Basic Relational Execution Model. This parallel program 
environment provides abstractions tailored for the effi¬ 
cient execution of Request Manager programs. 

The Object Manager provides basic object storage and 
manipulation facilities required to support the database sys¬ 
tem. This manager stores persistent objects, controls concur¬ 
rent use of shared objects, and provides logging and recovery 
facilities for transaction support. We based the Object Man¬ 
ager on the Arjuna system 5 with ideas incorporated from the 
CHOICES 6 and Camelot 7 projects. 

Parallel execution of queries. The major influences on 
the Relational Execution Model design were the DDC project 8 
and the Bubba project. 9 DDC was a project in the ESPRIT I 
program that had the objective of building a multiproces¬ 
sor database machine. The Bubba project was a multiprocessor 
database system. We based the parallel execution of the que¬ 
ries on the following principles: 

1) The relations are horizontally partitioned into fragments 
that are distributed across the set of available processing 
elements of the machine. One of the advances in the 
physical optimizer is the development of methods for 
determining the degree of parallelism in an operation. 
These methods allow the system to determine the opti¬ 
mal number of processing elements to be used during 
evaluation. For base relations the assignment of frag¬ 
ments to physical processing elements is relatively static. 
For the intermediate relations however, the assignment 
occurs at execution time. We refer to the set of processing 
elements across which a relation is partitioned as the 
home of the relation. 
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2) Where possible, processing takes place at the location 
of the data so the data is not moved. Naturally, this is 
not possible when an operation involves more than one 
relation. In this case the optimizer must choose the local 
operations in such a way as to minimize the movement 
of data. 

3) The separation of data flow and control flow allows 
optimizations that significantly reduce the number of 
control messages. 

A standard exemplar that is being used within the project 
forms the basis of the description of the parallel evaluation of 
queries. This exemplar is a share management system. The 
schema in Figure 6 defines two relations from the exemplar. 


CREATE TABLE scost ( CREATE TABLE exchange ( 
share-id c4, currency c2, 

cost integer2; rate integer2); 

currency c2); 

and the query is: 

SELECT share-id, cost, rate FROM scost, exchange 
WHERE cost < 100 AND scost.currency = exchange.currency 


Figure 6. Share management system exemplar. 

We plan to extend the Create table command to allow 
users to specify the distribution algorithm, the attribute to be 
used, and the size of the home. In the absence of user-supplied 
information the physical optimizer chooses these parameters 
for the relations. The Data Manager determines the mapping 
of the relations based on the sizes of the relations’ homes 
and the loading of the processing elements. 

In this example we assume that the Data Manager chooses 
processing elements 1, 3, 8, and 9 for Scost and 4 and 5 for 
Exchange. We also assume that a hash function on Share-id 
distributes Scost, and a hash function on exchange distrib¬ 
utes Currency. 

When the Request Manager compiles the query, the physi¬ 
cal optimizer decomposes the Join operation implied by the 
query into two suboperations Sel and Join. Sel prefilters the 
local fragment of Scost for those tuples in which cost is less 
than 100. Sel then distributes the tuples using the hash func¬ 
tion for Exchange. The Join suboperation joins a tuple from 
Scost with the local fragment of Exchange. A trigger message 
sent to the Sel operations starts the processing of the query. 
Figure 7 illustrates the execution of this query. 




Message path 

PE 

Processing element 

o 

Operation 

□ 

Relation fragment 


Figure 7. An example of a query execution. 

This simple example illustrates the main principles of the 
computational model: 

• relations are partitioned into fragments, which are dis¬ 
tributed across their homes; 

• relational operations decompose into operations that 
execute at the home of the relations on which they op¬ 
erate and so use purely local data; and 

• their inputs are either a stream of messages or a frag¬ 
ment of a stored relation. 

However, this very simplified account of the execution of 
the query does not discuss many important issues. One im¬ 
portant benefit of processing the relations locally is that it 
allows the lock management to also be processed locally, 
thus providing an important performance improvement. 

The Elipsys language 

Elipsys 10 is a parallel logic programming system for com¬ 
plex applications. The system integrates Or-parallelism, con¬ 
straint satisfaction through finite domains, and an inter-face 
to the EDS database server. 

The particular combination of Or-parallelism and constraint- 
satisfaction problem-solving techniques, which prune the 
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search space in an a-priori manner, provides an efficient 
platform for executing search-intensive programs. Elipsys 
solves a typical combinatorial search problem—for example, 
graph coloring, scheduling, and some other related opera¬ 
tions research problems—in polynomial time. 

The syntax of the Elipsys programming language is de¬ 
rived from DEC-10 Prolog. Elipsys makes available to the 
programmer the following features: 

• Data-driven computation. This feature gives the 
programmer a flexible way of instructing the logic pro¬ 
gramming system in the way the paths of the search 
space can be computed. 

• Built-in constraints. We build in simple equalities and 
inequalities, linear equations, and optimized branch-and- 
bound techniques, which range over the domain of fi¬ 
nite discrete sets. 

• User-definable parallel constructs. Predicates can be an¬ 
notated as candidates for parallel evaluation and inter¬ 
face to the EDS database server through ESQL. 

The Elipsys execution model in Figure 7 combines a 
message-passing architecture for the control and scheduling 
of parallel work and a distributed, shared virtual address space 
for the implementation of the binding environment. The above 
combination permits Elipsys to execute efficiently under the 
Emex kernel by taking advantage of the facilities provided 
for task and thread management. It also uses the Emex- 
provided, distributed, and shared virtual memory scheme, 
which is kept coherent by a “lazy, strong” method. This co¬ 
herence scheme does not perform any coherence mainte¬ 
nance operations by default; explicit synchronization points 
in the application code trigger the operations. 

The Elipsys binding environment is both read-only and 
shared. A control Or-tree and a shared environment repre¬ 
sent the search space. A descendent Or-node inherits the 
binding environment of its ancestor Or-nodes. This inherited 
environment is read-only. Thus all the descendent Or-nodes 
hold the same view of the inherited environment. Modifica¬ 
tions to the shared environment occur through auxiliary 
structures, which are local descendent Or-nodes. These 
structures in turn become shared whenever a control Or- 
node gives rise to any descendent Or-nodes. 

The message-based Elipsys control and scheduling 
mechanisms make use of the control Or-tree data structure, 
which is distributed over a set of workers. Each worker is 
allocated to one EDS processing element; a worker consists 
of a distributed scheduler, performing scheduling and con¬ 
trol functions, and a set of engines. An engine performs se¬ 
quential resolution steps, extended linear resolution with a 
selection function applied to definite clauses over finite do¬ 
mains. It also manages the interface to the EDS database 
server. A scheduler-engine interface describes the schedul¬ 


ing policy, pruning, and input/output interactions between 
the scheduler and the engine. 

Advanced applications using Elipsys 

Elipsys is oriented toward complex applications. We are 
developing a suite of programs to demonstrate the practical¬ 
ity of Elipsys for a wide range of applications domains. These 
programs will highlight different design features of Elipsys: 

• Compatibility ivith existing applications. A civil engi¬ 
neering program analyzes possible faults in concrete piles 
from acoustic data. The UK University of Bristol is 
parallelizing and porting this sequential Prolog program 
to the Elipsys subsystem. This activity will identify the 
potential problems that may be encountered when con¬ 
verting existing Prolog applications to Elipsys. 

• Capability for handling large data sets. The University of 
Athens is developing a tourist advisory system for Greece. 
The system provides customized holiday packages for 
individual tourists as well as general information for po¬ 
tential visitors to Greece. This application makes exten¬ 
sive use of the Elipsys connection to the EDS database 
subsystem. The raw tourist data can be stored in the 
EDS database rather than within Elipsys itself. 

• Deductive capability. System and Management S.p.A. in 
Italy currently implements a Treasury Management Sys¬ 
tem, an expert system for banking. Its role is to suggest 
profitable investment plans to bankers. Such an applica¬ 
tion is well suited for Elipsys and fully uses Elipsys’ in¬ 
herent deductive power and capability for interworking 
with the EDS database server. 

• Capability for managing complex data structures. ECRC 
in Munich is developing several applications in the do¬ 
main of molecular biology. This domain requires the 
management of vast amounts of complex data, again 
using the Elipsys/database connection. Moreover, the 
system requires the complex symbolic processing, for 
example, in structures matching different DNA molecules, 
and the built-in constraint facility of Elipsys. 

The application language 

Lisp is a powerful general-purpose programming language 
whose programs are written on a much higher level of ab¬ 
straction than the Pascal, Ada, and C procedural languages. 
Being so powerful, Lisp has been widely used for complex 
applications such as artificial intelligence. To this expressive 
power, we added the power of parallel processing in EDS 
Lisp. 11 

EDS Lisp extends the Common Lisp language. Because 
Common Lisp constitutes a de facto standard, users can eas¬ 
ily port most existing Lisp applications to the EDS machine. 

We selected the extensions of EDS Lisp after an intensive 


December 1990 85 





EDS 


study of other parallel Lisp systems. The extensions allow 
access to the EDS database system, and they provide lan¬ 
guage constructs for explicit parallelism. Explicit parallelism 
enables the programmer to specify large-grain parallelism that 
fits well to distributed-memory machines like the EDS. 

An EDS Lisp program can have an indefinite number of 
parallel processes. The programmer creates processes to per¬ 
form some action in parallel and to return a value. The EDS 
system schedules these processes. EDS Lisp contains a single 
construct to spawn parallel processes, the Future construct 
known from other parallel Lisp dialects. 12 Future constructs 
support transparent use of results of parallel processes. The 
main idea is that a Future immediately returns an (initially 
empty) placeholder for the result of the spawned process. 
The spawning process can then continue operation. When 
some process accesses this result, it waits until the result is 
available and then continues operation. Both the placeholder 
mechanism and the implicit waiting are invisible to the pro¬ 
grammer. Consider, for example, the following piece of EDS 
Lisp code: 

(setq x (future f pi ... pn)) 
which corresponds to 

x := future (f (pi, ..., pn)); 


German 


(a) 


German 


One 

processor 



64 processors, 
parallel Lisp 

—o— 
—o— 
—o— 
—o— 


English 



English 



(b) 


in a procedural programming style. A parallel process is 
spawned using the Future construct to compute the function 
/with parameters pi, ..., pn. The Future call immediately re¬ 
turns a placeholder for the result of / and assigns it to the 
variable x. The spawning process then continues in parallel 
to the process computing f If a process reads the variable x, 
it tests implicitly whether the result is available and waits if 
necessary. 

EDS Lisp also provides a Mailbox concept for communi¬ 
cation between processes and a Critical Section mechanism, 
among other things, for synchronized access to shared 
variables. 


Figure 8. Sequential (a) and parallel (b) translation on 
Lisp systems. 

is immense. The documentation for a complete technical 
product line often amounts to several hundred thousand pages 
that have to be made available in a multitude of languages. 

Users can access the parallel processing power of EDS 
Lisp not only to translate such a huge mass of text but also to 
improve the quality of the translation by using more advanced 
and therefore more resource-intensive algorithms. 


Metal 

The Metal machine translation system translates natural- 
language documents 13 and currently requires a special- 
purpose Lisp machine for production use. The EDS Lisp 
application is complex enough to conquer both the CPU- 
power and storage limitations of today’s workstations. 

We expect a speedup of more than a factor of 300 for 
running Metal on the EDS machine; the translation of 250 
pages that needs 10 hours today should be accomplished in 
two minutes on the EDS machine, as shown in Figure 8. This 
performance increase is highly relevant for the application, 
because the translation volume for technical documentation 


The EDS PROJECT IS A MAJOR, PROMISING Commis- 
sion of the European Community-sponsored ESPRIT II col¬ 
laboration between Bull, ICL, Siemens, and their jointly owned 
ECRC research center. The EDS system primarily focuses on 
the large-scale information server, which must manage infor¬ 
mation efficiently and effectively across the spectrum from 
data to knowledge. 

The EDS system enables programs calling the SQL, Lisp, 
and Elipsys interfaces to exploit large-scale parallelism es¬ 
sentially transparently. We’ve described the database and 
language aspects of the EDS system, looking at the SQL, Lisp, 
and Elipsys subsystems. 
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The EDS project combines the complementary skills of its 
partners and associate partners to achieve a clear and com¬ 
mon goal. At the end of the second of four years, the project 
is on schedule to switch on an EDS machine in 1991 and 
demonstrate applications in 1992. (B 
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in which the I I operator in example i) specifies that state¬ 
ments s, to s n are to be performed in parallel. On one pro¬ 
cessor, this operation may be mapped into a series of processes 
activated in some undefined order. In ii), the ; operator 
specifies sequential execution of the statements. Replicators 
exist for simplifying repetitive parallel or sequential state¬ 
ments. Examples iii) and iv) are alternatives to i) and ii). 
Parle only provides for synchronous execution of processes 
initiated by the I I operator, in that these processes must all 
terminate before the initiating program can continue to the 
next statement. 

The simplest conditional statement contains a single state¬ 
ment, which is executed when a guard condition is true. The 
most complex conditional statement contains multiple, 
guarded statements; the guards execute in parallel. When the 
statement associated with the first true guard executes, par¬ 
tially executed guards are discarded. 

Although intended as a compiler target language (CTL), 
Parle is probably a better parallel programming language. 
When it is used as a CTL, Parle’s weak typing places a 
large runtime checking burden on architectures not spe¬ 
cifically designed to support the architectural model. The 
process model also has limitations, especially the limited 
control over processes and their synchronous execution. 
Process creation is statically defined at compilation time, 
and processes once initiated run to completion. The model 
cannot suspend or delete a process once it is started because 
there is no process by which an executing process can be 
identified. Neither does the model support dynamic process 
creation and process migration. 

The process model is fine for a programming language, 
but it limits a CTL. The applications projects produced sub¬ 
stantial amounts of code in Parle, which was easy to use and 
effective. 

Virtual Machine Code 

The VMC fully realizes the Kernel System model at the 
level of machine code or assembler. It provides a model of 
the Kernel System that can be ported onto a variety of paral¬ 
lel architectures. Consequently, the VMC is a low-level lan¬ 
guage that reduces the work of performing the port. The 
philosophy behind the design of the VMC was again to provide 
the necessary components to support the Kernel System while 
limiting complexity. Thus, the VMC has 

• a reduced instruction-set style computational model. 

• a small, simple instruction set with which more complex 
operations can be implemented. (The exception is that 
list operations are members of the instruction set, as lists 
are primitive elements.) 
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• only one addressing mode. 

• a load/store philosophy extended by message-passing 
operators get and put. 

We assumed that all instructions would take equal time to 
execute (of course, this assumption cannot be maintained for 
list operators and nonlocal accesses at the hardware level). 

The VMC has a flat, unbounded address space of proces¬ 
sors identified by number: 1 to «. Each processor has a local 
list-structured memory and a root pointer (called gp) to the 
top-level list. (See Figure 2 again.) A processor can randomly 
access any location within its local memory by address. An 
address is a list containing a starting point, or context, in the 
memory. It is followed by a sequence of selectors that specify 
how to traverse the memory from the context to the addressed 
memory element (for example, the five-element address list 
[gp 4 4 3 2] in Figure 2). Using gp as the context for an 
address provides an absolute addressing scheme. Other 
contexts can provide for relative addressing. 

A processor can access the memory local to any other 
processor through an extension of the addressing scheme. 
Thus, [gp 0 45 3 2 4] is the address of a memory element in 
processor 45. The 0 selector applied to gp is an escape 
mechanism to proceed from local memory into the address 
space of processors in which the following selector is used to 
identify the processor. Nonlocal memory locations can only 
be absolutely addressed. Note that the first element in all lists 
has the selector 1. 

The VMC provides for a fully list-structured, von Neumann 
memory with lists, integers, and the empty state as primitive 
data types. Shared memory and message-passing occur 
through the four memory operators (load, store, get, and 
put) in both local and nonlocal memory. The VMC maps 
both code and data into lists and manipulates them accord¬ 
ingly, as it also does for addresses. 

The VMC provides the standard arithmetic and logical 
operators for integers. It also provides a basic set of list op¬ 
erators similar to those in Parle. These operators 

• find the length of a list, 

• concatenate two lists, 

• form a sublist (either starting from the head of the list or 
ending at the tail of a list), or 

• create a new list. 

The list operators provide symbolic operations, while the 
addressing mechanism and memory operators provide the 
random-access capability required for numeric operations. 

The VMC has a control-flow model of execution. Its pro¬ 
gram counter identifies both the memory list and the element 
within this list from which the next instruction is to be fetched. 
Unconditional and conditional branch instructions allow for 
branching within the current code list or into another code 


list within the processor’s local memory. Each processor has 
an evaluation stack (similar to that of the transputer). The 
VMC places data values on this stack when they are fetched 
from memoiy. Operators find their operands on the stack 
and return their results to that location. The VMC does not 
have the usual, numeric stack with unlimited depth. The return 
information from subroutine calls has to be explicitly moved 
from the evaluation stack into memory elements at the start 
of a subroutine. 

We wanted to again limit the complexity and give flexibility 
to system implementers in the handling of subroutine calls. 
However, the system needed a local data space for subrou¬ 
tine parameters, local subroutine variables, and return ad¬ 
dresses—and a relative addressing scheme to access the space. 
In particular, the system had to support recursive subroutines. 
The VMC provides for these needs by adding extra contexts 
that can identify sublists in the local memory. The sublist 
selection can be changed as required, say, on subroutine 
calls, to provide a new list for local data. 

The VMC implements copy semantics as described earlier. 
When a load is executed, the system copies the contents of 
the addressed element. It then places the copy (whether list, 
integer, or empty) on the execution stack. When a store oc¬ 
curs, the item (list, integer, or empty) that is overwritten in 
the addressed memory element is destroyed. Thus, no shar¬ 
ing of lists occurs in the VMC. The VMC does not require 
garbage collection for lists because they are immediately 
destroyed when overwritten or otherwise consumed. Although 
this mechanism simplifies the VMC, it has major implications 
for hardware supporting the VMC. During a write, for example, 
the system reads the value to be overwritten and checks it to 
see whether it is a list. If it is, the system deletes the list and 
frees the memory it occupies. 

The VMC has a simple multiprocessing model for process 
selection and execution. Only one process list exists. When a 
process is suspended—either through the suspend operator 
or a blocked message-passing operation—the executing 
process takes its place at the end of the process list. The first 
process on the list is removed and executed. When a process 
spawns a new process, the current process is placed at the 
start of the process list and the new process begins execution. 
No access is provided to the process list, and one cannot 
change the sequence of execution. 

The parallel processing model of the VMC allows proces¬ 
sors to operate independently and intercommunicate via 
operations on nonlocal memory. The VMC specifies nothing 
else about the communications (no specified communica¬ 
tions hardware, protocol, or message structure exist). The 
VMC model of interprocessor communications is strictly one 
of accessing a memory element. The VMC passes the content 
of a memory element (list, integer, or empty). No restriction 
exists as to the complexity of a list that is being passed. (The 
model can copy all of a computing node’s local memory by 


December 1990 89 







SPAN 


performing a load operation at a processor’s address.) 

An example of VMC code that concatenates two lists and 
stores the result follows: 


[gp 1 5 6 88] 
load 


[gp 63 7 12] 
load 


cat 


[gp 9 12 3 6 5] 
store 


Load address onto evaluation stack 
Perform load on element, copying 
contents to evaluation stack; 

Address is consumed and removed from 
stack 

Load address onto evaluation stack 
Perform load on element, copying 
contents to evaluation stack; 

Address is consumed and removed from 
stack 

Concatenate two items from top of stack, 
return result list to stack; 

Consume operands—error if not both lists 
Load address onto evaluation stack 
Store list into memory, address and list 
are removed from stack 


The major problem with the VMC is that its low-level nature 
makes it difficult to map onto most existing architectures. 
This characteristic also defeats its primary objective: easy 
porting of the Kernel System to various machines. However, 
the VMC does provide a very powerful and consistent ar¬ 
chitectural model that fully supports all the features of the 
Kernel System. In addition, its low-level nature (conversely) 
makes it an excellent architectural model for the development 
of a computer system especially targeted to support the Kernel 
System. Thus, the VMC became the model for the VLSI work 
package of SPAN that developed the Sprint processor de¬ 
scribed in the following section. 

Sprint architecture 

The aim of the VLSI work package was to design a parallel 
computer system that efficiently supported the Kernel System. 
We further constrained the work package to support the VMC, 
for a number of reasons. The VMC provided an excellent 
model to work with, and its low-level nature restricted the 
search space of the designers. The VMC’s reduced instruction 
set also fit well with the design and development approaches 
feasible for a team of three. 6 

We targeted the design not at a machine that directly 
modeled the VMC, but to a functionally equivalent machine 
to which VMC programs could be easily mapped. Also, we 
did not expect that all the VMC features would be directly 
supported by hardware. (In fact, the hardware does not 
support the list creation, extension, deletion, and compari¬ 
son operations.) Our primary design objectives were to sup¬ 
port most of the features of the VMC model as efficiently as 
possible. These features included its list-structured, von 
Neumann memory with shared memory and message-passing 


operators, and an operand stack. 

We hoped to obtain as much hardware support as pos¬ 
sible and reasonable with a team of three and a limited budget. 
The design allowed those features not directly mapped into 
hardware to be easily and efficiently mapped into software. 

Our secondary objectives were to expand the design 
(without restricting any of the primary objectives) to make it 
as general purpose as possible. We wanted to 

• support different programming styles, 

• provide an architectural environment suitable for a 
complex operating system (such as protection, flexible 
task scheduling, interrupt support, etc.), and 

• provide a robust environment to limit the effects of in¬ 
correct programs (such as hardware-type checking of 
instruction operands and hardware-bounds checking on 
a list access that ensures the access is within the length 
of the list.) 

An important overall design objective (especially to the 
designer) was efficient, fast operation. To this end, we con¬ 
centrated on minimizing the following: 

• the processor clock period, 

• the number of clock cycles per instruction, 

• the memory-access time, and 

• the number of these accesses. 

These elements are particularly important with respect to 
the implementation of list-structured memory. 

We initially divided the design of the computing node into 
processor, list-structured memory, and communications. We 
developed a novel algorithm to support the list-structured 
memory so that it could be implemented within the proces¬ 
sor. Thus, we only needed to develop two components for 
the computing mode: a communications chip and a reduced 
instruction-set computer chip. 6,7 

List-structured memory. The implementation of the list- 
structured memory of the VMC was a major issue. It affected 
the development of many other parts of the project. This 
implementation is central to the provision of a computer that 
efficiently executes VMC programs and serves as a fast engine 
for both symbolic and numeric processing. In particular, it 
was necessary to furnish a true random-access capability for 
such numeric processing as array and structure accessing, 
and instruction branching. We looked for a list-memory 
scheme that could be implemented on a standard, linearly 
addressed, physical memory (readily available at low cost 
and high densities). By introducing a level of indirection, the 
level of a memory element is no longer required to hold a 
list. Instead, a memory element only needs the capability of 
holding a pointer to a list. The major memory problem re¬ 
mains the structure to which the pointer refers. 
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Figure 3. Mapping a list into physical memory pages. 

We developed a novel page-based scheme from knowl¬ 
edge of page-based memoiy management. 8 We chose a paged 
scheme for the same reason it is used in memory manage¬ 
ment: the ease of free space management in a system with 
dynamic memory allocation. The tree-structured scheme 
supports random access to all list entries. It also supports list 
extension at the head and tail of the structure without having 
to copy the list. Each list requires its own instance of a tree 
structure. Its memory elements consist of the leaves of the 
tree, grouped into pages. The branch nodes in the tree are 
page pointers down the tree structure, also grouped into pages 
(see Figure 3). At the top of the tree is a single page with 
pointers to lower level pages. The number of levels between 
the top of the tree and the leaves depends on the list size and 
the page size. The tree structure is referenced by a list pointer, 
which contains the address of the top-level page, the length 
of the list, and the depth of the tree structure. The architecture 
provides for the existence of only one list pointer to a tree 
structure because: 

• the list length is stored in the pointer, 

• the system uses on-the-fly garbage collection, and 

• the list-structured memory model does not support the 
merging of the memory structure as would occur with 
storage of multiple copies of a list pointer. 

One such tree structure represents each list, although the 
number of levels in the structure varies with the list length. 
Consideration of the memory element size led to restrictions 
on the maximum list structure and the maximum list length. 
We selected a page size of 32 elements. A larger size would 
have led to fragmentation, since many lists will be short. A 


smaller size would have led to deeper structures and more 
memory accesses. The page size, however, did lead to a 
restriction of a maximum of three levels in the list structure 
and a maximum list length of 32,768 elements. This arrange¬ 
ment limits to 15 bits the subscript needed to locate an entry 
in a list. An entry is located as follows: 

• divide the subscript into three 5-bit fields (A, B, and C in 
Figure 3), 

• use field A to select an entry in the first-level page, 

• use the address produced with field B to select an entry 
in a second-level page, and 

• use the address produced with field C to select an entry 
in a third-level page. 

Tagged architecture. We created different data types by 
placing a tag on each and every memory location. This en¬ 
abled the three VMC data types—integer, list, and empty—to 
be correctly identified. The decision to have a tagged archi¬ 
tecture greatly influenced the further development of the 
design. It allowed the separation of list pointers from integers 
and runtime hardware type-checking of instruction operands 
with no overhead. This method also enabled a number of 
other mechanisms to be implemented in hardware to provide 
efficient coding and reduced runtime overhead. 

Tagging required a larger memory element size than 32 
bits, since a 32-bit integer capability was deemed essential. 
The components included in the list pointer—page address, 
length, and level structure—further increased the size of the 
memory element to 40 bits. The level structure, combined 
with the tag, produced a 3-bit tag field and eight data types. 
This provided list data types for one, two, and three-level list 
structures, in addition to integer, empty, and three extra data 
types. One of the extra types marks the first element in an 
address list—the reference point—so that address lists can 
be identified from data lists to provide runtime checking of 
addresses. The other two extra types are reserved for users. 
Hardware performs conditional checks on these types, among 
others. 

Figure 4 shows the structure of the list pointer. The provision 
of a tag field and a number of data types led to a more robust 
environment through runtime checks on data objects, ad¬ 
dresses, and instruction operands. This approach reduced 
the complexity of both the processor and data management 
by making the data types distinct. No operations to the data 
field of any object could change it into a different data type. 
The tag field also helped simplify and shorten program code. 


List pointer 


Copy (1 bit) 

Tag (3 bits) 

Length (15 bits) 

Physical page address (16 bits) 

Offset (5 bits) 


Figure 4. Structure of the list pointer. 
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Using only a 3-bit tag limits the hardware expense of provid¬ 
ing the runtime checking facilities. 

Modes of operation. Early in the design, we saw that many 
VMC operations could not be easily implemented in hard¬ 
ware and should be supported in software. These tasks in¬ 
cluded complex list operations (except obtaining the list 
length), multiply and divide (because of the silicon area re¬ 
quired), task scheduling, and register save/restore operations 
on task-switching. The system also had to support exception 
handling for such functions as blocking put and get operations 
and process-switching, list-bound checking exceptions, in¬ 
struction tracing, and interrupt handling. These requirements 
led to the provision of three modes of operation: executive, 
supervisor, and user. 

In the executive mode, which accesses the whole system, 
the system inhibits all exceptions and interrupts. The major 
roles of the executive mode are to handle task-switching— 
whether generated by hardware or software exception, or by 
interrupt—and to organize the task scheduling and initiation. 
Software must perform the latter to give greater flexibility to 
the scheduling algorithm. 

The processor contains a timer register with an interrupt 
signal to allow for time-sliced scheduling. The executive mode 
has its own set of registers, including code pointer registers. 
Consequently, no state must be saved by either software or 
hardware on entry to executive mode. This arrangement 
simplifies the hardware design and provides a fast switch to 
and from the mode, which can execute very short sequences 
of code without interruption or large overhead. 

The fact that no interruption of executive mode can occur 
limits the code that can execute in it and requires that code 
to be fully tested and reliable. Thus, the executive mode 
serves key functions only. 

To provide for the rest of the operating system, the 
nonexecutive mode subdivides into supervisor and user 
modes. These modes are identical in tenns of the accessible 
instructions and registers, differing in just two ways. First, 
interrupts can be masked in supervisor mode only. Second, 
the software-exception vectors generated by the two modes 
subdivide into two disjoint sets—even vectors to user mode 
and odd vectors to supervisor mode. Because one mode 
cannot generate the software-exception vectors of the other, 
the modes are completely separate. Each mode enters the 
executive mode at a different point, which obviates privilege 
checks before code execution to simplify and speed operation. 

We used the odd/even divide operation to locate all the 
entry points close together at the start of a list in the event of 
a small numbers of vectors. We envisaged an operating sys¬ 
tem that would mainly execute as tasks in supervisor mode. 
These tasks would be scheduled in the same way as user 
tasks. The system would enter executive mode to execute 
short sections of essential code, select a new task to execute, 
and switch back to nonexecutive mode. 


The hardware allows for up to 16K software vectors for 
each mode, although the executive mode sets the maximum 
that can be generated upon entering the nonexecutive mode. 
This large number of software vectors provided great flex¬ 
ibility and was easy to implement with hardware already 
present for other purposes. 

Instructions and registers. Access to elements in the 
memory requires the traversal of the list memory structure 
through translation of logical list addresses such as [gp 3 5 2] 
to physical memoiy addresses. We designed the processor to 
perform this translation process as efficiently as possible. The 
processor has 32 instructions, three of which are multicycle 
instructions that perform address translation. The most com¬ 
plex of these instructions translates a full logical address in 
an indivisible sequence to locate and read the addressed 
memory location. We see these multicycle instructions as 
essential to keeping the memory access times as short as 
possible and to maintaining memoiy consistency throughout 
the address translation process. They are developed on top 
of the single-cycle instructions, which act as a form of mi¬ 
crocode. Consequently the multicycle instructions do not 
compromise the processor speed and only add a small amount 
to chip size and complexity. 

We used a small register set to restrict the process state 
and the physical layout. The size of the physical layout pre¬ 
vented the use of register windows. (The architectural model 
did not suit their use, anyway). We employed 28 addressable 
registers and 32 register references, since the latter do not all 
refer to different or distinct registers. A subset of 16 registers 
is exclusively available for the nonexecutive mode. Four of 
the general registers can function as either ordinary registers 
or a four-element stack used by the VMC. Two register ref¬ 
erences allow an item to be pushed or popped from the 
stack. The flags register holds information about the current 
top of stack and the stack depth. Four other register references 
allow any of the four stack elements to be accessed if it 
exists. All stack violations cause a hardware exception. If the 
stack size is set to a maximum and the push and pop refer¬ 
ences are not used, the registers can function normally. 

We adopted a 16-bit instruction size, which led to the choice 
of 32 instructions and 32 register names. This approach en¬ 
abled two instructions, called an instruction pair, to reside in 
one memory element. A 4-bit condition code that further 
controlled the execution and interpretation of the instruction 
pair accompanied it. One can use 5- and 16-bit literals (con¬ 
stants) with any instruction. In die latter case, the literal replaces 
the second instruction of a pair. Because the instruction set 
and register set are orthogonal, no hardware restricdons exist 
for their use—although some combinations are not sensible 
programming. 

Placing two instructions into a memory element allowed 
us to improve processor operation by making the execu¬ 
tion of the pair indivisible by interrupts, bus arbitration re- 


92 IEEE Micro 







quests, and shared bus accesses. Both hardware and soft¬ 
ware exceptions are taken between the instructions of a 
pair. This mechanism treats the instruction pair almost like 
the single instruction in more traditional architectures, giv¬ 
ing a somewhat greater complexity to the instruction set. 
The initial and major reason for doing this was to provide 
indivisible memory operations. These operations allow a 
list address to be evaluated and a memory element to be 
read and modified (necessary for get and put) without any 
possibility of invalidation by some other memory operation. 
(Operations via direct memory access or task-switching could 
otherwise invalidate the address translation or modify the 
addressed memory element under some circumstances.) 

The instruction-pair mechanism allows memory opera¬ 
tions to be handled by an address-translation instruction, 
followed by a second data-movement instruction. The 
address-translation instruction locates and reads the addressed 
memory. The second instruction moves the read data to a 
register or writes a new value to memory. All memory op¬ 
erations include an initial read of the addressed memory, 
followed by a second instruction to move the read value to 
a register or to a perform a write to memory. 

Our approach simplified and reduced the necessary in¬ 
structions and their complexity, while still allowing a more 
powerful set of operations to be performed via instruction- 
pair combinations. 

The instruction pair also allows for an alternative form of 
conditional branching that doesn’t break the instruction-fetch 
pipeline. This test-and-skip instruction either conditionally 
executes the second instruction or skips over it. The test- 
and-skip instruction and the more usual conditional branch 
instruction have a combined repertoire of 32 condition tests. 
Consequently, we kept these instructions even when a more 
powerful technique of conditional instruction execution was 
developed. 

This technique uses the 4 spare bits in the integer element 
to provide 16 tests on whether to execute all or part of the 
associated instruction pair. The tests provide for executing 
both instructions of the pair or only the first one (for in¬ 
stances in which a no-operation instruction and a wasted 
cycle are required). More important, this technique can treat 
the instruction as a 32-bit literal and load it onto the evalu¬ 
ation stack without execution, which is useful for bringing 
in 32-bit masking patterns. The remaining tests are all con¬ 
ditionals on various flags or tag fields. This capability has 
proved useful in creating nonblocking put and get instruc¬ 
tions. It enables the blocking action to be performed in 
conditionally executed software (a single instruction pair) 
with minimal overhead. 

The tagged architecture also allows noninteger data ob¬ 
jects to be placed in the instruction stream. The system 
identifies these objects by their tags and moves them to the 
top of the evaluation stack. Instructions are tagged as inte¬ 


gers. This process requires trivial amounts of hardware and 
speeds the loading of these objects. It is particularly useful 
for fetching list addresses for memory operations. One can 
just insert the list pointers into the code. 

Addressing. We intend user mode programs (and the 
bulk of any operating system software) to use the list ad¬ 
dressing scheme exclusively. This approach allows physical 
addresses to be formed and consumed within indivisible 
operations so that memory consistency is rigorously main¬ 
tained. Because of the dynamic memory structure, if a list is 
copied or even extended, the mapping of all list locations— 
and possibly of sublists—to physical memory can change. 
Thus no physical addresses, except within a list pointer, 
should be a part of the program state. Even on a subroutine 
call, the saved value for return should be a list address. 
(The actual value is programmable, since no subroutine call 
exists and the system has to build it on top of branch in¬ 
structions.) The executive software makes use of both address 
forms. 

The use of list addresses even extends to the code pointer 
registers. A list pointer register and a selector register identify 
the current code list and the next instruction within this list. 
However, to speed up sequential program fetching, a regis¬ 
ter that holds the physical address of the next instruction is 
kept consistent with the code pointer registers. In many 
instances, the system can use a physical address from this 
register, which removes the need for a list translation and 
reduces overhead. 

Sprint has an expanded number of contexts. The archi¬ 
tecture allows any register that can hold a list pointer to act 
as a context in a list address. This function provides greater 
flexibility and a choice of absolute or relocatable address¬ 
ing for different compilation models. It also allows the 
movement of tasks within or between processors. Sprint 
provides absolute addressing by referencing all addresses 
from gp. Relative addressing requires the assignment of some 
fixed point or points in a program. 

Figure 5 on the next page shows how a program in the C 
programming language might be mapped into a list struc¬ 
ture as a single list with sublists. All global objects, data, 
and functions reside in the main program list (called program 
list in Figure 5.) This program could be stored in any loca¬ 
tion in the list structure memory of a processor. A relative 
addressing scheme would place the list pointer to the top- 
level list in a register. Sprint provides a register sd for this 
purpose (not shown in the figure). All locations within the 
program can be addressed relative to this. For instance, [sd 
88] can reference some global data object, while [sd 44 15] 
can locate an entry point in a subroutine. Provided that 
physical addresses are not saved in memory, the program 
can move to any part of memory—even to another proces¬ 
sor—when the program itself is not executing. 

Copy semantics and list deletion. We designed Sprint 
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Figure 5. C-program structure mapped into lists. 

to support copy semantics with two forms. The copy-on- 
load form of the VMC copies a list as soon as it is brought 
from memory by a load operation. A copy-on-store form 
delays the copying of a list until it is absolutely necessary, 
which reduces the amount of copying. For example, the 
latter form would not copy a list that is read into the processor 
just to find its length. Hardware exceptions support both of 
these forms. For copy-on-store, the system also provides a 
copy bit in the list pointer to record that a list should be 
copied when it is stored to memory. 

If a write to memory overwrites a list pointer, the system 
generates an exception so that the underlying list can be 
discarded to free its pages. No other form of garbage collec¬ 
tion occurs. 

Production. The processor was manufactured on a 2- 
micrometer silicon chip with a total area of 70 sq mm. It 
subdivides into a data path with all the registers and func¬ 
tional units, and a control part made up of finite-state ma¬ 
chines and instruction decoders. We designed the processor 
to operate from a 10-MHz clock with a maximum instruc¬ 
tion execution rate of 5 MIPS (million instructions per sec¬ 
ond). The processor pipelines instruction prefetch cycles 
with instruction execution. Memory accesses can execute in 
one instruction cycle. We haven’t completed testing the de¬ 
vice. Further details on the hardware implementation ap¬ 
pear elsewhere. 6 - 9 

Communications. The bulk of project activity concerned 
the processor, while the remaining effort went into devel¬ 
oping a communications system. The aim here was to de¬ 
velop a simple, regular, extensible system around a switch 


component—all on one chip. We looked 
for a parallel transmission system rather 
than a serial one. We rejected bus-based 
systems on the basis that a multiple- 
bus system was too complex and a 
single-bus system would limit the 
number of processors. 

We developed a message-passing 
network similar to many packet¬ 
switching networks—except that the 
packet length is variable and unlimited. 
We designed a linear network around 
a single, three-interface switch element. 
The processor connects to one inter¬ 
face, while the other two interfaces 
connect to similar switches. Each inter¬ 
face channel contains two unidirectional, 
11-bit channels: one for transmission and 
one for reception. Each channel oper¬ 
ates independently, and the system is 
free of deadlocks. Single-item buffering 
occurs on each channel, and the sys¬ 
tem spreads a transmitting message over 
the network. The interface between the switch and the lo¬ 
cal processor is memory mapped into the processor’s physical 
address space. The interface automatically performs trans¬ 
lations between 40-bit processor words and the 11-bit for¬ 
mat used on the network. 

Because the linear network is not an optimal choice for a 
large system, a two-dimensional grid system was our first 
choice. We chose the linear network, however, because it 
was easy to develop and because of pinout limitations on 
the final design. 

We achieved all the goals of the VLSI work package. We 
implemented a design around the VMC that provides a con¬ 
sistent list-structured memory for both code and data with 
random access to any element in the memory. The design is 
fast, although not yet at the level of recent 20-plus-MIPS 
machines. Memory access speeds, although not always as 
fast as for a traditional memory, are still very good. The 
architecture will fully support an operating system and is 
broad enough to allow a variety of programming models to 
be mapped onto it. We found the use of a tagged architecture 
to be an especially decisive factor in producing a system 
with low overhead and improved reliability and versatility. 
In particular, tagging—along with the complex exception 
system—provides reduced processing overhead on function- 
operand checking and special processing for particular data 
types. 

The DICE architecture 

The motivations for the development of the DICE archi¬ 
tecture are somewhat unusual. Most parallel architectures 
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are geared for the fastest execution of a single program 
(with several processes). They offer all the available processors 
on an equal basis for concurrent execution. Instead, DICE 
is mainly oriented to the user, dedicating several processors 
to each user and giving absolute priority to that user’s pro¬ 
cessors. We refer to these philosophies as program parallelism 
and user parallelism. 

Many parallel architectures are designed for batch pro¬ 
cessing with the execution of one program. This program 
usually consists of several modules (each with several pro¬ 
cesses). Although the modules may be compiled separately, 
they are bound and allocated to processors in a combined 
way. This is program parallelism. 

Many problems require this type of architecture. How¬ 
ever, demand is increasing for highly interactive, general- 
purpose, multiuser computers that people can use daily for 
all types of tasks. A growing trend is the use of worksta¬ 
tions, interconnected in a local area network, whose main 
advantage is to provide each user with one dedicated pro¬ 
cessor. This is user parallelism. 

Making both types of parallelism available without hav¬ 
ing to simulate user parallelism is a fundamental feature in 
any successful, parallel, multiuser computer. However, par¬ 
allel architectures with unshared memory have not reached 
this level of sophistication. 

The main goal of the DICE distributed-memory architec¬ 
ture is to uniformly concentrate the functions of several net¬ 
worked workstations and/or program-oriented parallel 
computers in one parallel machine. In this approach, each 
user can have several dedicated processors rather than just 
one. The fact that interprocessor communication is much 
faster than in a local area network efficiently exploits pro¬ 
gram parallelism as well. 

The computing environment must be as dynamic as that 
of current shared-memory computers, eliminating the tradi¬ 
tional view of a host controlling an array of load-and-go 
processors. Each processor must have a resident kernel that 
provides the traditional services found in a uniprocessor. The 
kernel must also provide, among other things, 

• interprocessor communication, 

• dynamic load balancing involving automatic data and 
code migration, and 

• a distributed virtual memory system with support for 
multiple copies (in different processors) of a given 
memory area. 

Incremental extensibility. Parallel architectures cannot 
have a fixed configuration. The number of resources in the 
system and their configuration must be set according to the 
user’s needs, which naturally vary with time. On the other 
hand, the increment (minimum change) should be very small 
to avoid discontinuities in the investment. Machines based 


on hypercube configurations that contain power-of-2 num¬ 
bers of processors are not incrementally extensible. The 
number of processors should increase according to the growth 
of the number of users or of their needs, not in powers of 2. 

Software/hardware independence. Extensibility must 
not affect existing software. A machine in which the addi¬ 
tion of a processor implies recompilation becomes awkward 
in a multiuser environment. The software (machine code) 
must run unchanged in every configuration, from the smallest 
to the largest, using whatever resources happen to be avail¬ 
able. The software should also discover these resources au¬ 
tomatically and dynamically. This independence, coupled 
with the extensibility, means that all configurations—from 
the PC to the mainframe—must run exactly the same code. 

System organization. Interconnecting networks such as 
the hypercube have very limited extensibility. Solutions such 
as a tree or a mesh are preferable from this point of view, 
because the fixed number of connections per processor does 
not depend on the number of processors in the architecture. 

Given the preference for user parallelism and extensibil¬ 
ity, the DICE architecture adopted the tree as the intercon¬ 
nection topology. If a system has a small number of processors 
(generally up to 16), a bus serves as an extensible network 
par excellence. All processors connect from a minimum 
distance and can be added or removed without affecting 
the others. When local caches cannot avoid bus saturation, 
a higher number of processors can be organized into groups. 
They can connect to a communication element (CE) that 
then connects to an upper bus, forming an N-ary tree. The 
tree does not have to be balanced. Figure 6 on the next 
page depicts a typical configuration. 

Besides permitting extensibility, the tree provides: 

• favored local communication within each group, 

• easy broadcasting of messages, 

• simple message-routing performed in a decentralized 
manner locally to each CE, and 

• a natural hierarchy. For instance, all processors in a 
subtree can share a disk located at the apex of that 
subtree. 

Of course, traffic congestion occurs near the apex of the 
tree. However—due to the preference for user parallelism— 
this should constitute no real problem. Most of the traffic is 
local to each subtree. A natural organization is to assign a 
group of processors (headed by a CE) to each user, allow¬ 
ing the user to resort to a neighbor’s processors whenever 
they are unused. Resources (which appear in the tree as 
disk 1 and disk 2 processors with special capabilities) that 
are permanently shared by several users are located in up¬ 
per levels in the tree. Disks and printers fall into this cat¬ 
egory. The greater the number of users who share a given 
resource, the higher its position in the tree and the smaller 
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Figure 6. Typical configuration of the DICE architecture. 

its traffic with each user. This way, the traffic near the apex 
of the tree is not substantially higher than that on the lower 
levels. This condition differs from program parallelism, in 
which all processors and resources communicate with all 
others in the tree in a uniform manner. 

Each dashed square, called a station, includes the hard¬ 
ware dedicated to each user. This hardware includes one or 
more processing elements (PEs) and a CE that connects to 
the rest of the system. A station (in the context of the DICE 
architecture) is a board or a group of boards sharing the 
same rack with other stations (not one system connected to 
others by a local area network). A PE includes a processor 
and its private memory, but one could build a station with 
processors that share a common memory. This approach 
would create a very effective organization for a limited number 
of processors. Thanks to CEs, this organization does not 
compromise extensibility in any way. 

A possible organization consists of stations 1 and 2 shar¬ 
ing disk 1, while station 3 resorts to disk 2. If disk 1 did not 
exist for some reason, all stations could easily share disk 2. 
In any organization, any user would be allowed to log in at 
any station. Better disk performance obviously occurs if the 
user logs in as closely as possible to the disk(s) containing 
needed files. The upper CE connects to the outer world 
(where X terminals, for instance, can exist) and is shared by 
all stations. 

Note that some stations can act as pools of unassigned 
processors to be used by whoever needs them. A station 
can migrate processes to another station, although with re¬ 
duced priority if that station is assigned to a user. This nev¬ 


ertheless increases throughput if that 
station’s user is not logged in or is not 
using all assigned processors all the time 
(which is usually the case in interactive 
programming). 

A conventional workstation corre¬ 
sponds to one station with one proces¬ 
sor, one or more CEs (for Ethernet 
interface, graphics hardware, etc.) and 
one or more disks. 

Implementation status. Machines 
such as the one briefly described do 
not exist yet. Very few proposals have 
been made for such a system, and most 
of them consist of conventional com¬ 
puters interconnected by a LAN or some 
high-speed network. The architecture 
most similar to DICE seems to be the 
one proposed by Feitelson, 10 which 
presents a multiuser parallel computer 
(not yet built). The basic difference is 
that Feitelson’s computer has strict 
controls. A tree structure of controllers 
closely oversees the PEs and coordinates the scheduling of 
processes (gang scheduling). DICE designers preferred to 
treat processes independently and to rely on a dynamic 
load-balancing algorithm to spread processes over avail¬ 
able hardware. Control of Feitelson’s architecture is hier¬ 
archical and related to the controllers’ network, whereas 
control is truly distributed, without hierarchy, in DICE. 

The SPAN project detailed specifications of a preliminary 
version of DICE and built a simulator that ran on transputers. 
A VLSI design of the DICE processor used standard cells for 
quick prototyping. Unfortunately, this design resulted in a 
200-sq-mm die (with 2-mm technology); its fabrication was 
never pursued. The simulator was implemented to exactly 
mimic the hardware structure of the VLSI processor (so that 
the software could exercise the hardware bugs). Consequently, 
the simulation was unacceptably slow (about 7,000 instruc¬ 
tions per second on an Inmos T800 transputer). Neverthe¬ 
less, this speed was enough to build a kernel that provided 
basic services and ran a few demonstration programs writ¬ 
ten in Parle and compiled to the DICE machine code. (Further 
details appear elsewhere. 11 ) 

We chose to adopt an object-oriented model for the DICE 
processor because this would be a step toward providing 
hardware support for the increasingly popular object-oriented 
programming paradigm. In addition, the object organiza¬ 
tion, enforced by hardware through the use of tags, would 
provide a protection not possible with more conventional 
hardware. 

The development of DICE continued after the SPAN project 
officially concluded. We are now designing a version to tackle 
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many points left open in the first simplified version. The fea¬ 
tures include 

• distributed, shared, virtual memory with support for 
replicated (in several processors) objects with guaran¬ 
teed consistency, 

• increased protection, and 

• support for common operating system features. 

We ARE CONSIDERING TI IE REIMPLEMENTATION of Mach 
(starting only from its interface specification), since a simple 
port would completely miss the capabilities of the hard¬ 
ware. (Mach is a parallel distributed operating system 
developed by Carnegie Mellon University.) A preliminary 
analysis indicates that this task is much easier than writing 
Mach for a conventional architecture, given the hardware 
support available. For example, Mach ports are protected 
by nature (the hardware provides capabilities). 

We are now designing a new VLSI version of the DICE 
processor (this time using a full-custom VLSI circuit) and 
expect a preliminary chip to be working by the end of 
1991. IS! 
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Neurocomputing 

continuedfrom p. 31 


/* Topology , Data & Function Information */ 

^define system_variable constant 
struct (...) config = (...); 
typedef struct ( 

float state (constant); /* status info. V 
synapse_type ‘synapse; /* status info. V 
neuron_type“input_neuron; /* topological info. V 
neuron__type **outward_neuron; /* topological info. V 
rule_type weight_sum; /* functional info. V 

rule_type weight_update; ... /* functional info. V 
) 

neuron_type; 
struct ( 

net_def; /*“ definition of network “*/ 

) system; 

/* Control Information */ 

/* System function definitions V 
connect ( ... ) 

read_weights (name_file) ( ... ) 
main ( ) 

( /* calls to system functions to control application */ 

connect () 

read_weights (file_name) 
read_states (file_name) 
leam ( ) 
recall () ... 


Figure 4. The NC framework for a neural network description. 



Figure 5. The neural processor. CR indicates a customizable register. 


volving a cascadable neuron processor 
with local control and state value circu¬ 
lation through a shift register with no 
external control. We realized a 20-MHz 
chip with only one neuron; tests on it 
proved positive. 

We can customize this architecture 
since its parameters are the number of 
neurons, layers, and interconnections; 
the state and coefficient precision; and 
the learning capabilities. Figure 5 depicts 
the neural processor. Such processors 
can be put together in a global architec¬ 
ture as shown in Figure 6. 

Image processing 

We tried, in this project, to introduce 
neural techniques step by step in the 
image processing domain. We began to 
solve partial (high-level or low-level) 
problems. Next, we plan to develop co¬ 
herent, complete applications for op¬ 
eration in our future ESPRIT project 
Galatea. Such applications would be ca¬ 
pable of handling complex problems and 
showing the coupling of different neu¬ 
ral networks and classical techniques to 
obtain a global result. 

We thus consider the two years of 
Pygmalion as a first step of exploration 
in the domain of image processing. 

We have chosen, among all the pos¬ 
sible available application domains of 
image processing, two very different 
kinds of pictures. Let us expose their 
respective interests. The area of remote 
sensing data has several domains of 
applications: geology, agriculture, me¬ 
teorology, hydrology, and cartography 
among others. Moreover, researchers 
have already studied this type of im¬ 
age with classical techniques, especially 
all areas that concern low-level pro¬ 
cessing; a lot of results are available 
for comparison. Nevertheless, processing 
this type of data is rather difficult, even 
by classical techniques. Moreover, we 
have no flexibility when dealing with 
these images. One is constrained by the 
precise view one is considering. 
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preprocessings can be done using different areas of the 
first layer. Then, the training algorithm (Back Propagation) 
defines the proper preprocessing(s) and performs the 
classification. 

For 3D, we start with several images corresponding to a 
view of the same 3D objects from different angles. We 
want to be able to recognize automatically a 3D workpiece 
from a set of 3D objects, independently of their orientation 
or position on a plane. We use an associative memory al¬ 
gorithm to perform this task. 1 This application is still at a 
preliminary stage. 

Finally, we compared these image-processing neural 
classifiers to more classical techniques, such as statistical 
methods and syntactical pattern-recognition techniques, for 
which we have available extensive experimental data. 


Figure 6. Putting together neural processors. 

This is not the case with the other type of data we are 
considering, namely workpieces in a factory automation 
context. This problem is simpler and more tractable: One 
can easily choose the pictures one wants to deal with, ac¬ 
cording to the type of problem selected. We can roughly 
divide the different tasks to be investigated into low-level 
image processing performed on Spot images and high-level 
processing performed on workpiece images, after segmen¬ 
tation by classical techniques. 

Low-level processing includes 1) compression using learning 
by back-propagation on a neural network in autoassociation 
and 2) segmentation into homogeneous regions via the 
combined approach of edge and region detection. Before 
the end of the project, we plan to perform a stereovision 
application. 

We have only considered supervised segmentation. The 
use of the back-propagation algorithm to produce edges 
or to classify different textual regions, after extraction of 
local features, proved very efficient. High-level processing 
mainly concerns classification and pattern recognition in 
two and three dimensions. 

For 2D, we considered two different approaches. In the 
first one we input a description of the object in terms of 
statistical or syntactical features to a neural network used 
as a classifier. During the learning process, the network 
evaluates the features and selects the most relevant ones. 
We compared several types of neural networks (Hopfield’s 
network, the multilayer Perceptron, Kohonen’s topological 
maps). 

In the second approach we used the image directly (not 
preprocessed) to feed the neural network. The approach is 
based on a constrained multilayer Perceptron in which the 
first layer preprocesses the data. The synaptical coefficients 
are forced to be invariant with respect to translation and/ 
or rotation of the objects presented to the network. Several 


Speech and signal processing 

Researchers have tried many heuristic and even sophis¬ 
ticated methods in learning about automatic speech recog¬ 
nition in the past. Whereas progress in many other fields 
of technology has been astonishingly rapid, research in¬ 
vestments into this “natural” task have, however, not yet 
yielded adequate results. After initial optimism, research¬ 
ers became more and more aware of the many difficulties 
to be overcome. Researchers place considerable hope in 
the application of artificial neural networks. They view it 
as a new method within this special area of pattern recognition 
that is capable of overcoming current problems and in¬ 
creasing recognition-system performance. 

Noise interference, speaker insensitivity, and a large vo¬ 
cabulary are the main problem areas in current speech rec¬ 
ognition. In traditional systems, features must be traded for 
other features to maintain a certain level of performance, 
for example, vocabulary size against speaker independence 
or against robustness in noise. In contrast, the characteris¬ 
tics of neural networks, as hitherto identified, led us to 
anticipate them. We’d like to provide such system features, 
thus offering an enlarged capability profile with at least 
comparable recognition rates. For an arbitrary selection of 
speech-recognition tasks, researchers have already experi¬ 
mentally demonstrated the efficiency of neural networks. 

The objective of the speech processing application in 
the Pygmalion project is characterized by the investigation 
of a variety of artificial neural network architectures. These 
architectures—in accordance with appropriate training 
methods implemented by efficient learning algorithms for 
individual topics—provide several features. They are: 

1) Isolated word recognition. This task focuses on speaker- 
adaptive isolated word recognition (IWR) for a me¬ 
dium-size vocabulary (about 100 words) in real office 
environments. 

2) Speaker independence and adaptivity. We also plan to 


100 IEEE Micro 






































































study low-level preprocessing of speech signals with 
respect to feature extraction and noise reduction in 
relation to speaker-independent IWR for a limited vo¬ 
cabulary. We plan to show that IWR networks have 
feature extraction capabilities, which could be reused 
later on in a continuous speech recognizer. 

3) Speech signal preprocessing, including noise reduction. 
Coding becomes a very important problem for speech 
recognition with neural networks. We must consider 
encoding the raw speech signals and extracting the 
relevant features, together with the form in which we 
plan to present the segmental (and possibly the supraseg- 
mental) information. 

4) IWR in a noisy environment. This task is devoted to 
speaker-independent recognition of isolated words in 
a telecommunications environment. It includes the design 
and implementation of a small vocabulary, isolated- 
word recognizer whose accuracy and robustness make 
it suitable for a telephonic application. Even if a very 
small vocabulary is used (for example, 10 digits plus 
some command words such as “help” and “repeat”), 
many possible automatic services could be offered to 
the telephone subscribers. 

5) Subword-unit recognition and coarticulation. This task 
deals with discrimination, coarticulation, and subword 
units. A language parses speech into sensitive language 
units. If we consider that language as a compound of 
self-organized systems, it is very important to know 
how consonants and vowels self-organize and com¬ 
bine themselves into discrete units. This problem of 
coarticulation and subword units is a fundamental and 
critical aspect of multispeaker situations. The goal of 
this speech-processing research is to show that a 
nonarbitrary structure of discrete units exists inside 
words that permits a good multispeaker word speech 
recognition. 

The last field of application we are exploring is the classi¬ 
fication of underwater natural sounds. A relatively small study 
has tried to assess neural network capabilities compared to 
many kinds of algorithms already developed and optimized. 
We apply neurocomputing in two ways. We can use a pro¬ 
cessed version of the signal, obtained through well-known 
and efficient preprocessing algorithms. Or, we can directly 
apply neural classification to the raw signal. 

This study approach also illustrates how efficient neural 
networks may be in terms of the amount of time and effort 
needed to develop an application. Provided we take into 
account the basic knowledge of the type of signal to be treated 
in the global structure of the network, automatic neural 
adaptivity is quicker to implement than theoretical studies. It 
is also more efficient than empirical research when signifi¬ 
cant characteristics must be extracted from the signal before 


the recognition process. We expect this field to give rise to 
operational applications in the veiy near future. 

Since the January 1989 start of the Pygmalion 

project, we’ve made major progress on the environment 
and the applications. 

We've produced the graphics monitor and algorithm 
library and fully specified the high-level N and interme¬ 
diate-level NC languages. 

We’ve made available for experimentation a first ver¬ 
sion of the graphics monitor (operating on an NC speci¬ 
fication of a neural application) and the C version of the 
algorithm library. A compiler from N to C++ is also available. 
By the end of 1990, we project that the complete Pygmalion 
environment, including the final version of the graphics 
monitor, the compiler from N to NC, and a first version 
of the N library, will be finished. 

We plan to reuse and complete this software environ¬ 
ment in the ESPRIT II Galatea project; in particular, the 
development teams of Thomson, Philips, and Siemens 
plan to use it. Thus, we expect this tool to become a de 
facto European standard. A first commercial product should 
be available for sale by mid-1991. 

The hardware integration study proved three points. 
Neurocomputing on silicon is feasible. Dedicated ASICs 
(application-specific ICs) can be generated in a short time 
starting from the building block that is the neuron pro¬ 
cessor. Customization of neural networks through sili¬ 
con compilation techniques and PROM or soft-programmable 
switches should be studied in a future European project, 
namely the Galatea project. 

We can state several general results from the applica¬ 
tions study. For applications in which efficient classical 
preprocessing is known, results prove better with the 
use of preprocessed signals than with the raw signal. But 
when no efficient preprocessing is known, more effort 
must be exerted concerning the architecture of the net¬ 
work. Neural networks produce better results than classical 
methods every time the designers of the networks have 
taken advantage of their adaptive capacity. In other cases, 
the results roughly compare to the ones of the classical 
methods. In many cases, Back Propagation successfully 
and easily solved classification problems, even if some 
more sophisticated combinations of several algorithms 
have sometimes been allowed to outperform Back Prop¬ 
agation’s results. 

We’ve successfully implemented many image process¬ 
ing methods combined with classical methods, which will 
constitute the basic building blocks for the future opera¬ 
tional applications to be developed in Galatea. 

We demonstrated speech and signal processing meth¬ 
ods, proving in particular noise independence for recog- 
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nition tasks. Even if fundamental difficulties are still an 
obstacle to efficient continuous speech recognition, in¬ 
dustrial applications are already possible, in particular in 
the telecommunications environment. P 
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sor (the 8085) and had done 
so; it had every reason to be¬ 
lieve that it would have the op¬ 
portunity to second-source 
Intel’s successor product; but 
Intel failed to come through. In 
1981 the tables were turned. 

Intel badly needed a second 
source for its 8086 micropro¬ 
cessor and had to come back 
to AMD. But AMD remem¬ 
bered the past. So the contract 
negotiated in 1982 whereby 
AMD would help Intel by sec¬ 
ond-sourcing the Intel 8086 
was basically very favorable to 
AMD. 

At the time, Intel needed an alter¬ 
nate source to bolster its competitive 
position against the Motorola 68000. 
AMD expected to focus its develop¬ 
ment efforts on peripheral chips, rather 
than processors. These designs would 
be transferred to Intel in exchange for 
Intel’s processor designs. As the 8086 
and 286 became successful, however, 
Intel no longer needed AMD. Further¬ 
more, AMD’s peripheral chip designs 
were late arriving and featured larger 
die sizes than anticipated. Conse¬ 
quently, the NIH (not invented here) 
factor caused Intel’s product people to 
“look upon all AMD products with a 
jaundiced eye,” according to Phelps. 

Intel unilaterally decided to end their 
cooperation with AMD in 1984, but it 
kept this decision secret from AMD. 
Phelps noted that “Intel always thought 
the worst of AMD.” He characterized 
Intel’s actions as a classic example of a 
breach of contract—“preaching good 
faith, but practicing duplicity.” 

Phelps cited ineptitude rather than 
duplicity as AMD’s main failing: 

AMD management failed to 
appreciate the time required for 
and the enormous resources 
needed in both capital and 
personnel to change from LSI 
[large-scale integration] to VLSI 


[very LSI] technology, and to 
adapt from NMOS [N-type 
metal-oxide semiconductor] to 
the CMOS [complementary 
MOS] process. 

Ironically, Intel’s developments con¬ 
tained their share of problems, but the 
company chose to focus its critical eye 
on AMD rather than itself. 

The "Big Case" 

AMD attempted to argue that Intel’s 
failure to uphold the agreement denied 
AMD the “benefit of the bargain.” As a 
result, AMD contended it should re¬ 
ceive the 386 design from Intel. Phelps 
dismissed this argument—called the 
Big Case—by stating that AMD as¬ 
sumed the contract between the two 
firms could force Intel to turn the prod¬ 
uct over to AMD in the absence of a 
value-for-value exchange. The judge 
termed this “an impossibility, both ex¬ 
plicitly and conceptually.” 

Inexplicably, one page later, Phelps 
apparently contradicts his previous 
statement: “Whether AMD’s losses 
caused by Intel’s breaches of contract as 
found by this decision are sufficient to 
warrant a transfer of the 80386 to AMD 
will be determined in the remedies 
module.” 

The details of the particular product 
disagreements between Intel and AMD 
are too lengthy to describe here. A few 
aspects of Phelp’s ruling, however, are 
worth highlighting. 

The judge chided Intel’s senior man¬ 
agement for refusing to transfer the 
8087 to AMD “even though a senior of¬ 
ficial at Intel in substantial charge of the 
AMD/Intel relationship believed Intel 
was legally obligated to transfer the part 
to AMD.” 

Intel also tried to persuade AMD to 
waive its rights to “CF points” (the way 
of measuring designs for the purpose of 
exchanges between the two compa¬ 
nies) for a hard-disk controller product 
in return for the 8087. Phelps said, 
“AMD refused to give in to what it cor¬ 
rectly perceived to be extortion on the 
part of Intel; and the 8087 has never 


been transferred to this day.” 

Another interesting situation con¬ 
cerned the transfer of updates to the 
286 design from Intel to AMD. Accord¬ 
ing to the ruling Intel transferred 
“deliberately incomplete, deliberately 
indecipherable, and deliberately unus¬ 
able” information. AMD resorted to 
reverse-engineering Intel’s E-step and 
S-step versions of the 286 to continue to 
produce competitive products. Phelps 
called Intel’s actions “inexcusable.” 

In one instance Phelps ruled that 
AMD is liable to Intel. In transferring the 
7910 Codec chip to Intel, AMD subse¬ 
quently developed a chip with new 
features—the 7911—instead of improv¬ 
ing the 7910. 

Intel contended that AMD breached 
the contract by failing to transfer the 
updates to Intel, but Phelps ruled that 
the improvements in the 7911 were not 
updates under the contract. Further¬ 
more, he said that Intel cancelled its 
7910 for internal reasons and not be¬ 
cause of AMD’s marketing efforts on 
behalf of the 7911. However Phelps 
ruled AMD did breach the contract by 
attempting to “bad mouth” the 7910 
then being produced at Intel at the time 
AMD was in production with the 7911. 

Will Intel appeal? 

Sadly, the waste of resources poured 
into this arbitration is not finished. The 
remedies phase promises to be as 
contentious, but presumably shorter, 
than the liabilities arbitration. 

Even then, the litigation may not be 
over. Should Phelps award AMD the 
hundreds of millions of dollars the 
company is demanding in damages, 
Intel is likely to appeal. 
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fter three years in arbitration, a judge re¬ 
cently rendered a decision regarding the 
ill-fated technology exchange pact be¬ 
tween Intel and AMD. On October 11, retired 
superior court judge J, Barton Phelps’s ruling only 
settled the issue of liability. A remedies module 
set for November 15 will address potential dam¬ 
ages; this phase won’t conclude until sometime 
next year. 

Who won? The lawyers were the clear win¬ 
ners. At this point, they conducted 313 days of 
testimony, generated more than 42,000 pages of 
transcripts, and referenced 2,093 exhibits./ 

To an extent, AMD both won and lost. It won 
a moral victory—and perhaps a large cash settle¬ 
ment—when Phelps decided that Intel did not 
act in good faith and breached its contract with 
AMD. But since AMD previously said that it ex¬ 
pected to receive rights to the 386 design as a 
result of the arbitration, AMD also lost. 

In essence, Phelps ruled that Intel breached its 
contract with AMD, but it was not obligated to 
transfer any designs under the agreement be¬ 
cause AMD failed to deliver acceptable designs 
in return. However, he left open the possibility 
that the remedies module may require Intel to 
transfer the 386 designs after all. This is inconsis¬ 
tent with the rest of the decision, which man¬ 
dates a “product-for-a-product, value-for-value 
exchange.” 

The judge’s line of reasoning suggests that only 
a monetary settlement is possible. Intel counsel 
F. Thomas Dunlap made it clear in a conference 
call following the decision that Intel viewed any 
product transfer stemming from the remedies 
module as improper. According to Dunlap, Intel 
is likely to appeal if such a transfer is mandated. 

The 386 transfer may become a moot point. At 
AMD's press conference following the decision, 



chief executive officer Jerry Sanders stood in front 
of a plot of his firm’s 386-compatible design— 
code-named “Longhorn”—and stated that AMD 
will announce the product regardless of the out¬ 
come of the arbitration. 

The AMD 386 design uses Intel’s microcode. 
AMD asserts that it holds a license to use any 
Intel microcode as part of a 1976 agreement; 
Intel maintains that the agreement gives AMD 
the right to copy, but not to distribute, the mi¬ 
crocode. Pending litigation regarding the 287 
math coprocessor will settle this issue. 

The outcome of the microcode litigation could 
be far more important than the current arbitra¬ 
tion. AMD presumably developed a clean-room 
version of the microcode, but it is likely to con¬ 
tain compatibility problems. If AMD loses the mi¬ 
crocode copyright case, and if it doesn’t receive 
rights to the 386 in the remedies module, the 
company will have to pull the part off the market 
and face a suit by Intel for any profits made from 
selling the chip. 

Management at fault 

The arbitration decision was a stinging indict¬ 
ment of the management of both companies. 
Judge Phelps concluded the situation arose from 
many causes. As competitors, Intel and AMD 
found it difficult to adjust to a contract calling 
for mutual cooperation. Other contributing fac¬ 
tors included a contract that heavily favored AMD 
and the failures of senior management on both 
sides to carry out their contract responsibilities. 

The judge offered some background leading 
to the ill-fated 1982 agreement between the firms: 
In the late 1970s, AMD was left hanging 
out to dry by Intel. AMD had agreed to 
second-source Intel’s 8-bit microproces- 

continued onp. 103 
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